======================================================================== * README ======================================================================== Copyright 1991, 1996, 1999, 2000, 2007 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. THE GNU MP LIBRARY GNU MP is a library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. It has a rich set of functions, and the functions have a regular interface. GNU MP is designed to be as fast as possible, both for small operands and huge operands. The speed is achieved by using fullwords as the basic arithmetic type, by using fast algorithms, with carefully optimized assembly code for the most common inner loops for lots of CPUs, and by a general emphasis on speed (instead of simplicity or elegance). GNU MP is believed to be faster than any other similar library. Its advantage increases with operand sizes for certain operations, since GNU MP in many cases has asymptotically faster algorithms. GNU MP is free software and may be freely copied on the terms contained in the files COPYING* (see the manual for information on which license(s) applies to which components of GNU MP). OVERVIEW OF GNU MP There are five classes of functions in GNU MP. 1. Signed integer arithmetic functions (mpz). These functions are intended to be easy to use, with their regular interface. The associated type is `mpz_t'. 2. Rational arithmetic functions (mpq). For now, just a small set of functions necessary for basic rational arithmetics. The associated type is `mpq_t'. 3. Floating-point arithmetic functions (mpf). If the C type `double' doesn't give enough precision for your application, declare your variables as `mpf_t' instead, set the precision to any number desired, and call the functions in the mpf class for the arithmetic operations. 4. Positive-integer, hard-to-use, very low overhead functions are in the mpn class. No memory management is performed. The caller must ensure enough space is available for the results. The set of functions is not regular, nor is the calling interface. These functions accept input arguments in the form of pairs consisting of a pointer to the least significant word, and an integral size telling how many limbs (= words) the pointer points to. Almost all calculations, in the entire package, are made by calling these low-level functions. 5. Berkeley MP compatible functions. To use these functions, include the file "mp.h". You can test if you are using the GNU version by testing if the symbol __GNU_MP__ is defined. For more information on how to use GNU MP, please refer to the documentation. It is composed from the file doc/gmp.texi, and can be displayed on the screen or printed. How to do that, as well how to build the library, is described in the INSTALL file in this directory. REPORTING BUGS If you find a bug in the library, please make sure to tell us about it! You should first check the GNU MP web pages at https://gmplib.org/, under "Status of the current release". There will be patches for all known serious bugs there. Report bugs to gmp-bugs@gmplib.org. What information is needed in a useful bug report is described in the manual. The same address can be used for suggesting modifications and enhancements. ---------------- Local variables: mode: text fill-column: 78 End: ======================================================================== * demos/calc/README ======================================================================== Copyright 2001 Free Software Foundation, Inc. This file is part of the GNU MP Library. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/. DEMONSTRATION CALCULATOR PROGRAM This is a simple program, meant only to show one way to use GMP with yacc and lex to make a calculator. Usage and comments on the implementation can be found in calc.y. Within a GMP build tree, the generated Makefile can be used to build the program, make calc (or on a DOS system, "make calc.exe"). Elsewhere, once GMP has been installed, the program can be compiled with for instance gcc calc.c calclex.c -lgmp -o calc Or if GNU readline is used then gcc calc.c calclex.c calcread.c -lgmp -lreadline -o calc (again, on a DOS system "-o calc.exe"). Readline support can be enabled or disabled in calc-config.h. That file is created by the GMP ./configure based on the --with-readline option. The default is --with-readline=detect, which means to use readline if available. "yes" can be used to force it to be used, or "no" to not use it. The supplied calc.c was generated by GNU bison, but a standard yacc should work too. The supplied calclex.c was generated by GNU flex, but a standard lex should work too. The readline support may or may not work with a standard lex (see comments with input() in calcread.c). Note also that a standard lex will require its library "-ll" on the compile command line. "./configure" sets this up in the GMP build tree Makefile. ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * demos/expr/README ======================================================================== Copyright 2001, 2004 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. GMP EXPRESSION EVALUATION ------------------------- THIS CODE IS PRELIMINARY AND MAY BE SUBJECT TO INCOMPATIBLE CHANGES IN FUTURE VERSIONS OF GMP. The files in this directory implement a simple scheme of string based expression parsing and evaluation, supporting mpz, mpq and mpf. This will be slower than direct GMP library calls, but may be convenient in various circumstances, such as while prototyping, or for letting a user enter values in symbolic form. "2**5723-7" for example is a lot easier to enter or maintain than the equivalent written out in decimal. BUILDING Nothing in this directory is a normal part of libgmp, and nothing is built or installed, but various Makefile rules are available to compile everything. All the functions are available through a little library (there's no shared library since upward binary compatibility is not guaranteed). make libexpr.a In a program, prototypes are available using #include "expr.h" run-expr.c is a sample program doing evaluations from the command line. make run-expr ./run-expr '1+2*3' t-expr.c is self-test program, it prints nothing if successful. make t-expr ./t-expr The expr*.c sources don't depend on gmp-impl.h and can be compiled with just a standard installed GMP. This isn't true of t-expr though, since it uses some of the internal tests/libtests.la. SIMPLE USAGE int mpz_expr (mpz_t res, int base, const char *e, ...); int mpq_expr (mpq_t res, int base, const char *e, ...); int mpf_expr (mpf_t res, int base, const char *e, ...); These functions evaluate simple arithmetic expressions. For example, mpz_expr (result, 0, "123+456", NULL); Numbers are parsed by mpz_expr and mpq_expr the same as mpz_set_str with the given base. mpf_expr follows mpf_set_str, but supporting an "0x" prefix for hex when base==0. mpz_expr (result, 0, "0xAAAA * 0x5555", NULL); White space, as indicated by isspace(), is ignored except for the purpose of separating tokens. Variables can be included in expressions by putting them in the stdarg list after the string. "a", "b", "c" etc in the expression string designate those values. For example, mpq_t foo, bar; ... mpq_expr (q, 10, "2/3 + 1/a + b/2", foo, bar, NULL); Here "a" will be the value from foo and "b" from bar. Up to 26 variables can be included this way. The NULL must be present to indicate the end of the list. Variables can also be written "$a", "$b" etc. This is necessary when using bases greater than 10 since plain "a", "b" etc will otherwise be interpreted as numbers. For example, mpf_t quux; mpf_expr (f, 16, "F00F@-6 * $a", quux, NULL); All the standard C operators are available, with the usual precedences, plus "**" for exponentiation at the highest precedence (and right associative). Operators Precedence ** 220 ~ ! - (unary) 210 * / % 200 + - 190 << >> 180 <= < >= > 170 == != 160 & 150 ^ 140 | 130 && 120 || 110 ? : 100/101 Currently only mpz_expr has the bitwise ~ % & ^ and | operators. The precedence numbers are of interest in the advanced usage described below. Various functions are available too. For example, mpz_expr (res, 10, "gcd(123,456,789) * abs(a)", var, NULL); The following is the full set of functions, mpz_expr abs bin clrbit cmp cmpabs congruent_p divisible_p even_p fib fac gcd hamdist invert jacobi kronecker lcm lucnum max min nextprime odd_p perfect_power_p perfect_square_p popcount powm probab_prime_p root scan0 scan1 setbit sgn sqrt mpq_expr abs, cmp, den, max, min, num, sgn mpf_expr abs, ceil, cmp, eq, floor, integer_p, max, min, reldiff, sgn, sqrt, trunc All these are the same as the GMP library functions, except that min and max don't exist in the library. Note also that min, max, gcd and lcm take any number of arguments, not just two. mpf_expr does all calculations to the precision of the destination variable. Expression parsing can succeed or fail. The return value indicates this, and will be one of the following MPEXPR_RESULT_OK MPEXPR_RESULT_BAD_VARIABLE MPEXPR_RESULT_BAD_TABLE MPEXPR_RESULT_PARSE_ERROR MPEXPR_RESULT_NOT_UI BAD_VARIABLE is when a variable is referenced that hasn't been provided. For example if "c" is used when only two parameters have been passed. BAD_TABLE is applicable to the advanced usage described below. PARSE_ERROR is a general syntax error, returned for any mal-formed input string. NOT_UI is returned when an attempt is made to use an operand that's bigger than an "unsigned long" with a function that's restricted to that range. For example "fib" is mpz_fib_ui and only accepts an "unsigned long". ADVANCED USAGE int mpz_expr_a (const struct mpexpr_operator_t *table, mpz_ptr res, int base, const char *e, size_t elen, mpz_srcptr var[26]) int mpq_expr_a (const struct mpexpr_operator_t *table, mpq_ptr res, int base, const char *e, size_t elen, mpq_srcptr var[26]) int mpf_expr_a (const struct mpexpr_operator_t *table, mpf_ptr res, int base, unsigned long prec, const char *e, size_t elen, mpf_srcptr var[26]) These functions are an advanced interface to expression parsing. The string is taken as pointer and length. This makes it possible to parse an expression in the middle of somewhere without copying and null terminating it. Variables are an array of 26 pointers to the appropriate operands, or NULL for variables that are not available. Any combination of variables can be given, for example just "x" and "y" (var[23] and var[24]) could be set. Operators and functions are specified with a table. This makes it possible to provide additional operators or functions, or to completely change the syntax. The standard tables used by the simple functions above are available as const struct mpexpr_operator_t * const mpz_expr_standard_table; const struct mpexpr_operator_t * const mpq_expr_standard_table; const struct mpexpr_operator_t * const mpf_expr_standard_table; struct mpexpr_operator_t is the following struct mpexpr_operator_t { const char *name; mpexpr_fun_t fun; int type; int precedence; }; typedef void (*mpexpr_fun_t) (void); As an example, the standard mpz_expr table entry for multiplication is as follows. See the source code for the full set of standard entries. { "*", (mpexpr_fun_t) mpz_mul, MPEXPR_TYPE_BINARY, 200 }, "name" is the string to parse, "fun" is the function to call for it, "type" indicates what parameters the function takes (among other things), and "precedence" sets its operator precedence. A NULL for "name" indicates the end of the table, so for example an mpf table with nothing but addition could be struct mpexpr_operator_t table[] = { { "+", (mpexpr_fun_t) mpf_add, MPEXPR_TYPE_BINARY, 190 }, { NULL } }; A special type MPEXPR_TYPE_NEW_TABLE makes it possible to chain from one table to another. For example the following would add a "mod" operator to the standard mpz table, struct mpexpr_operator_t table[] = { { "mod", (mpexpr_fun_t) mpz_fdiv_r, MPEXPR_TYPE_BINARY, 125 }, { (const char *) mpz_expr_standard_table, NULL, MPEXPR_TYPE_NEW_TABLE } }; Notice the low precedence on "mod", so that for instance "45+26 mod 7" parses as "(45+26)mod7". Functions are designated by a precedence of 0. They always occur as "foo(expr)" and so have no need for a precedence level. mpq_abs in the standard mpq table is { "abs", (mpexpr_fun_t) mpq_abs, MPEXPR_TYPE_UNARY }, Functions expecting no arguments as in "foo()" can be given with MPEXPR_TYPE_0ARY, or actual constants to be parsed as just "foo" are MPEXPR_TYPE_CONSTANT. For example if a "void mpf_const_pi(mpf_t f)" function existed (which it doesn't) it could be, { "pi", (mpexpr_fun_t) mpf_const_pi, MPEXPR_TYPE_CONSTANT }, Parsing of operator names is done by seeking the table entry with the longest matching name. So for instance operators "<" and "<=" exist, and when presented with "x <= y" the parser matches "<=" because it's longer. Parsing of function names, on the other hand, is done by requiring a whole alphanumeric word to match. For example presented with "fib2zz(5)" the parser will attempt to find a function called "fib2zz". A function "fib" wouldn't be used because it doesn't match the whole word. The flag MPEXPR_TYPE_WHOLEWORD can be ORed into an operator type to override the default parsing style. Similarly MPEXPR_TYPE_OPERATOR into a function. Binary operators are left associative by default, meaning they're evaluated from left to right, so for example "1+2+3" is treated as "(1+2)+3". MPEXPR_TYPE_RIGHTASSOC can be ORed into the operator type to work from right to left as in "1+(2+3)". This is generally what's wanted for exponentiation, and for example the standard mpz table has { "**", (mpexpr_fun_t) mpz_pow_ui, MPEXPR_TYPE_BINARY_UI | MPEXPR_TYPE_RIGHTASSOC, 220 } Unary operators are postfix by default. For example a factorial to be used as "123!" might be { "!", (mpexpr_fun_t) mpz_fac_ui, MPEXPR_TYPE_UNARY_UI, 215 } MPEXPR_TYPE_PREFIX can be ORed into the type to get a prefix operator. For instance negation (unary minus) in the standard mpf table is { "-", (mpexpr_fun_t) mpf_neg, MPEXPR_TYPE_UNARY | MPEXPR_TYPE_PREFIX, 210 }, The same operator can exist as a prefix unary and a binary, or as a prefix and postfix unary, simply by putting two entries in the table. While parsing the context determines which style is sought. But note that the same operator can't be both a postfix unary and a binary, since the parser doesn't try to look ahead to decide which ought to be used. When there's two entries for an operator, both prefix or both postfix (or binary), then the first in the table will be used. This makes it possible to override an entry in a standard table, for example to change the function it calls, or perhaps its precedence level. The following would change mpz division from tdiv to cdiv, struct mpexpr_operator_t table[] = { { "/", (mpexpr_fun_t) mpz_cdiv_q, MPEXPR_TYPE_BINARY, 200 }, { "%", (mpexpr_fun_t) mpz_cdiv_r, MPEXPR_TYPE_BINARY, 200 }, { (char *) mpz_expr_standard_table, NULL, MPEXPR_TYPE_NEW_TABLE } }; The type field indicates what parameters the given function expects. The following styles of functions are supported. mpz_t is shown, but of course this is mpq_t for mpq_expr_a, mpf_t for mpf_expr_a, etc. MPEXPR_TYPE_CONSTANT void func (mpz_t result); MPEXPR_TYPE_0ARY void func (mpz_t result); MPEXPR_TYPE_I_0ARY int func (void); MPEXPR_TYPE_UNARY void func (mpz_t result, mpz_t op); MPEXPR_TYPE_UNARY_UI void func (mpz_t result, unsigned long op); MPEXPR_TYPE_I_UNARY int func (mpz_t op); MPEXPR_TYPE_I_UNARY_UI int func (unsigned long op); MPEXPR_TYPE_BINARY void func (mpz_t result, mpz_t op1, mpz_t op2); MPEXPR_TYPE_BINARY_UI void func (mpz_t result, mpz_t op1, unsigned long op2); MPEXPR_TYPE_I_BINARY int func (mpz_t op1, mpz_t op2); MPEXPR_TYPE_I_BINARY_UI int func (mpz_t op1, unsigned long op2); MPEXPR_TYPE_TERNARY void func (mpz_t result, mpz_t op1, mpz_t op2, mpz_t op3); MPEXPR_TYPE_TERNARY_UI void func (mpz_t result, mpz_t op1, mpz_t op2, unsigned long op3); MPEXPR_TYPE_I_TERNARY int func (mpz_t op1, mpz_t op2, mpz_t op3); MPEXPR_TYPE_I_TERNARY_UI int func (mpz_t op1, mpz_t op2, unsigned long op3); Notice the pattern of "UI" for the last parameter as an unsigned long, or "I" for the result as an "int" return value. It's important that the declared type for an operator or function matches the function pointer given. Any mismatch will have unpredictable results. For binary functions, a further type attribute is MPEXPR_TYPE_PAIRWISE which indicates that any number of arguments should be accepted, and evaluated by applying the given binary function to them pairwise. This is used by gcd, lcm, min and max. For example the standard mpz gcd is { "gcd", (mpexpr_fun_t) mpz_gcd, MPEXPR_TYPE_BINARY | MPEXPR_TYPE_PAIRWISE }, Some special types exist for comparison operators (or functions). MPEXPR_TYPE_CMP_LT through MPEXPR_TYPE_CMP_GE expect an MPEXPR_TYPE_I_BINARY function, returning positive, negative or zero like mpz_cmp and similar. For example the standard mpf "!=" operator is { "!=", (mpexpr_fun_t) mpf_cmp, MPEXPR_TYPE_CMP_NE, 160 }, But there's no obligation to use these types, for instance the standard mpq table just uses a plain MPEXPR_TYPE_I_BINARY and mpq_equal for "==". Further special types MPEXPR_TYPE_MIN and MPEXPR_TYPE_MAX exist to implement the min and max functions, and they take a function like mpf_cmp similarly. The standard mpf max function is { "max", (mpexpr_fun_t) mpf_cmp, MPEXPR_TYPE_MAX | MPEXPR_TYPE_PAIRWISE }, These can be used as operators too, for instance the following would be the >? operator which is a feature of GNU C++, { ">?", (mpexpr_fun_t) mpf_cmp, MPEXPR_TYPE_MAX, 175 }, Other special types are used to define "(" ")" parentheses, "," function argument separator, "!" through "||" logical booleans, ternary "?" ":", and the "$" which introduces variables. See the sources for how they should be used. User definable operator tables will have various uses. For example, - a subset of the C operators, to be rid of infrequently used things - a more mathematical syntax like "." for multiply, "^" for powering, and "!" for factorial - a boolean evaluator with "^" for AND, "v" for OR - variables introduced with "%" instead of "$" - brackets as "[" and "]" instead of "(" and ")" The only fixed parts of the parsing are the treatment of numbers, whitespace and the two styles of operator/function name recognition. As a final example, the following would be a complete mpz table implementing some operators with a more mathematical syntax. Notice there's no need to preserve the standard precedence values, anything can be used so long as they're in the desired relation to each other. There's also no need to have entries in precedence order, but it's convenient to do so to show what comes where. static const struct mpexpr_operator_t table[] = { { "^", (mpexpr_fun_t) mpz_pow_ui, MPEXPR_TYPE_BINARY_UI | MPEXPR_TYPE_RIGHTASSOC, 9 }, { "!", (mpexpr_fun_t) mpz_fac_ui, MPEXPR_TYPE_UNARY_UI, 8 }, { "-", (mpexpr_fun_t) mpz_neg, MPEXPR_TYPE_UNARY | MPEXPR_TYPE_PREFIX, 7 }, { "*", (mpexpr_fun_t) mpz_mul, MPEXPR_TYPE_BINARY, 6 }, { "/", (mpexpr_fun_t) mpz_fdiv_q, MPEXPR_TYPE_BINARY, 6 }, { "+", (mpexpr_fun_t) mpz_add, MPEXPR_TYPE_BINARY, 5 }, { "-", (mpexpr_fun_t) mpz_sub, MPEXPR_TYPE_BINARY, 5 }, { "mod", (mpexpr_fun_t) mpz_mod, MPEXPR_TYPE_BINARY, 6 }, { ")", NULL, MPEXPR_TYPE_CLOSEPAREN, 4 }, { "(", NULL, MPEXPR_TYPE_OPENPAREN, 3 }, { ",", NULL, MPEXPR_TYPE_ARGSEP, 2 }, { "$", NULL, MPEXPR_TYPE_VARIABLE, 1 }, { NULL } }; INTERNALS Operator precedence is implemented using a control and data stack, there's no C recursion. When an expression like 1+2*3 is read the "+" is held on the control stack and 1 on the data stack until "*" has been parsed and applied to 2 and 3. This happens any time a higher precedence operator follows a lower one, or when a right-associative operator like "**" is repeated. Parentheses are handled by making "(" a special prefix unary with a low precedence so a whole following expression is read. The special operator ")" knows to discard the pending "(". Function arguments are handled similarly, with the function pretending to be a low precedence prefix unary operator, and with "," allowed within functions. The same special ")" operator recognises a pending function and will invoke it appropriately. The ternary "? :" operator is also handled using precedences. ":" is one level higher than "?", so when a valid a?b:c is parsed the ":" finds a "?" on the control stack. It's a parse error for ":" to find anything else. FUTURE The ternary "?:" operator evaluates the "false" side of its pair, which is wasteful, though it ought to be harmless. It'd be better if it could evaluate only the "true" side. Similarly for the logical booleans "&&" and "||" if they know their result already. Functions like MPEXPR_TYPE_BINARY could return a status indicating operand out of range or whatever, to get an error back through mpz_expr etc. That would want to be just an option, since plain mpz_add etc have no such return. Could have assignments like "a = b*c" modifying the input variables. Assignment could be an operator attribute, making it expect an lvalue. There would want to be a standard table without assignments available though, so user input could be safely parsed. The closing parenthesis table entry could specify the type of open paren it expects, so that "(" and ")" could match and "[" and "]" match but not a mixture of the two. Currently "[" and "]" can be added, but there's no error on writing a mixed expression like "2*(3+4]". Maybe also there could be a way to say that functions can only be written with one or the other style of parens. ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mini-gmp/README ======================================================================== Copyright 2011-2013 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This is "mini-gmp", a small implementation of a subset of GMP's mpn and mpz interfaces. It is intended for applications which need arithmetic on numbers larger than a machine word, but which don't need to handle very large numbers very efficiently. Those applications can include a copy of mini-gmp to get a GMP-compatible interface with small footprint. One can also arrange for optional linking with the real GMP library, using mini-gmp as a fallback when for some reason GMP is not available, or not desired as a dependency. The supported GMP subset is declared in mini-gmp.h. The implemented functions are fully compatible with the corresponding GMP functions, as specified in the GMP manual, with a few exceptions: mpz_set_str, mpz_init_set_str, mpz_get_str, mpz_out_str and mpz_sizeinbase support only |base| <= 36; mpz_export and mpz_import support only NAILS = 0. The REALLOC_FUNC and FREE_FUNC registered with mp_set_memory_functions does not get the correct size of the allocated block in the corresponding argument. mini-gmp always passes zero for these rarely used arguments. The implementation is a single file, mini-gmp.c. The performance target for mini-gmp is to be at most 10 times slower than the real GMP library, for numbers of size up to a few hundred bits. No asymptotically fast algorithms are included in mini-gmp, so it will be many orders of magnitude slower than GMP for very large numbers. You should never "install" mini-gmp. Applications can either just #include mini-gmp.c (but then, beware that it defines several macros and functions outside of the advertised interface). Or compile mini-gmp.c as a separate compilation unit, and use the declarations in mini-gmp.h. The tests subdirectory contains a testsuite. To use it, you need GMP and GNU make. Just run make check in the tests directory. If the hard-coded compiler settings are not right, you have to either edit the Makefile or pass overriding values on the make command line (e.g., make CC=cc check). Testing is not (yet) as thorough as for the real GMP. The current version was put together by Niels Möller , with a fair amount of copy-and-paste from the GMP sources. ======================================================================== * mpn/README ======================================================================== Copyright 1996, 1999 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains all code for the mpn layer of GMP. Most subdirectories contain machine-dependent code, written in assembly or C. The `generic' subdirectory contains default code, used when there is no machine-dependent replacement for a particular machine. There is one subdirectory for each ISA family. Note that e.g., 32-bit SPARC and 64-bit SPARC are very different ISA's, and thus cannot share any code. A particular compile will only use code from one subdirectory, and the `generic' subdirectory. The ISA-specific subdirectories contain hierachies of directories for various architecture variants and implementations; the top-most level contains code that runs correctly on all variants. ======================================================================== * mpn/alpha/README ======================================================================== Copyright 1996, 1997, 1999-2005 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions optimized for DEC Alpha processors. ALPHA ASSEMBLY RULES AND REGULATIONS The `.prologue N' pseudo op marks the end of instruction that needs special handling by unwinding. It also says whether $27 is really needed for computing the gp. The `.mask M' pseudo op says which registers are saved on the stack, and at what offset in the frame. Cray T3 code is very very different... "$6" / "$f6" etc is the usual syntax for registers, but on Unicos instead "r6" / "f6" is required. We use the "r6" / "f6" forms, and have m4 defines expand them to "$6" or "$f6" where necessary. "0x" introduces a hex constant in gas and DEC as, but on Unicos "^X" is required. The X() macro accommodates this difference. "cvttqc" is required by DEC as, "cvttq/c" is required by Unicos, and gas will accept either. We use cvttqc and have an m4 define expand to cvttq/c where necessary. "not" as an alias for "ornot r31, ..." is available in gas and DEC as, but not the Unicos assembler. The full "ornot" must be used. "unop" is not available in Unicos. We make an m4 define to the usual "ldq_u r31,0(r30)", and in fact use that define on all systems since it comes out the same. "!literal!123" etc explicit relocations as per Tru64 4.0 are apparently not available in older alpha assemblers (including gas prior to 2.12), according to the GCC manual, so the assembler macro forms must be used (eg. ldgp). RELEVANT OPTIMIZATION ISSUES EV4 1. This chip has very limited store bandwidth. The on-chip L1 cache is write- through, and a cache line is transferred from the store buffer to the off- chip L2 in as much 15 cycles on most systems. This delay hurts mpn_add_n, mpn_sub_n, mpn_lshift, and mpn_rshift. 2. Pairing is possible between memory instructions and integer arithmetic instructions. 3. mulq and umulh are documented to have a latency of 23 cycles, but 2 of these cycles are pipelined. Thus, multiply instructions can be issued at a rate of one each 21st cycle. EV5 1. The memory bandwidth of this chip is good, both for loads and stores. The L1 cache can handle two loads or one store per cycle, but two cycles after a store, no ld can issue. 2. mulq has a latency of 12 cycles and an issue rate of 1 each 8th cycle. umulh has a latency of 14 cycles and an issue rate of 1 each 10th cycle. (Note that published documentation gets these numbers slightly wrong.) 3. mpn_add_n. With 4-fold unrolling, we need 37 instructions, whereof 12 are memory operations. This will take at least ceil(37/2) [dual issue] + 1 [taken branch] = 19 cycles We have 12 memory cycles, plus 4 after-store conflict cycles, or 16 data cache cycles, which should be completely hidden in the 19 issue cycles. The computation is inherently serial, with these dependencies: ldq ldq \ /\ (or) addq | |\ / \ | | addq cmpult \ | | cmpult | \ / or I.e., 3 operations are needed between carry-in and carry-out, making 12 cycles the absolute minimum for the 4 limbs. We could replace the `or' with a cmoveq/cmovne, which could issue one cycle earlier that the `or', but that might waste a cycle on EV4. The total depth remain unaffected, since cmov has a latency of 2 cycles. addq / \ addq cmpult | \ cmpult -> cmovne Montgomery has a slightly different way of computing carry that requires one less instruction, but has depth 4 (instead of the current 3). Since the code is currently instruction issue bound, Montgomery's idea should save us 1/2 cycle per limb, or bring us down to a total of 17 cycles or 4.25 cycles/limb. Unfortunately, this method will not be good for the EV6. 4. addmul_1 and friends: We previously had a scheme for splitting the single- limb operand in 21-bits chunks and the multi-limb operand in 32-bit chunks, and then use FP operations for every 2nd multiply, and integer operations for every 2nd multiply. But it seems much better to split the single-limb operand in 16-bit chunks, since we save many integer shifts and adds that way. See powerpc64/README for some more details. EV6 Here we have a really parallel pipeline, capable of issuing up to 4 integer instructions per cycle. In actual practice, it is never possible to sustain more than 3.5 integer insns/cycle due to rename register shortage. One integer multiply instruction can issue each cycle. To get optimal speed, we need to pretend we are vectorizing the code, i.e., minimize the depth of recurrences. There are two dependencies to watch out for. 1) Address arithmetic dependencies, and 2) carry propagation dependencies. We can avoid serializing due to address arithmetic by unrolling loops, so that addresses don't depend heavily on an index variable. Avoiding serializing because of carry propagation is trickier; the ultimate performance of the code will be determined of the number of latency cycles it takes from accepting carry-in to a vector point until we can generate carry-out. Most integer instructions can execute in either the L0, U0, L1, or U1 pipelines. Shifts only execute in U0 and U1, and multiply only in U1. CMOV instructions split into two internal instructions, CMOV1 and CMOV2. CMOV split the mapping process (see pg 2-26 in cmpwrgd.pdf), suggesting the CMOV should always be placed as the last instruction of an aligned 4 instruction block, or perhaps simply avoided. Perhaps the most important issue is the latency between the L0/U0 and L1/U1 clusters; a result obtained on either cluster has an extra cycle of latency for consumers in the opposite cluster. Because of the dynamic nature of the implementation, it is hard to predict where an instruction will execute. REFERENCES "Alpha Architecture Handbook", version 4, Compaq, October 1998, order number EC-QD2KC-TE. "Alpha 21164 Microprocessor Hardware Reference Manual", Compaq, December 1998, order number EC-QP99C-TE. "Alpha 21264/EV67 Microprocessor Hardware Reference Manual", revision 1.4, Compaq, September 2000, order number DS-0028B-TE. "Compiler Writer's Guide for the Alpha 21264", Compaq, June 1999, order number EC-RJ66A-TE. All of the above are available online from http://ftp.digital.com/pub/Digital/info/semiconductor/literature/dsc-library.html ftp://ftp.compaq.com/pub/products/alphaCPUdocs "Tru64 Unix Assembly Language Programmer's Guide", Compaq, March 1996, part number AA-PS31D-TE. "Digital UNIX Calling Standard for Alpha Systems", Digital Equipment Corp, March 1996, part number AA-PY8AC-TE. The above are available online, http://h30097.www3.hp.com/docs/pub_page/V40F_DOCS.HTM (Dunno what h30097 means in this URL, but if it moves try searching for "tru64 online documentation" from the main www.hp.com page.) ---------------- Local variables: mode: text fill-column: 79 End: ======================================================================== * mpn/alpha/ev6/nails/README ======================================================================== Copyright 2002, 2005 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains assembly code for nails-enabled 21264. The code is not very well optimized. For addmul_N, as N grows larger, we could make multiple loads together, then do about 3.3 i/c. 10 cycles after the last load, we can increase to 4 i/c. This would surely allow addmul_4 to run at 2 c/l, but the same should be possible also for addmul_3 and perhaps even addmul_2. current fair best Routine c/l unroll c/l unroll c/l i/c mul_1 3.25 2.75 2.75 3.273 addmul_1 4.0 4 3.5 4 14 3.25 3.385 addmul_2 4.0 1 2.5 2 10 2.25 3.333 addmul_3 3.0 1 2.33 2 14 2 3.333 addmul_4 2.5 1 2.125 2 17 2 3.135 addmul_5 2 1 10 addmul_6 2 1 12 addmul_7 2 1 14 (The "best" column doesn't account for bookkeeping instructions and thereby assumes infinite unrolling.) Basecase usages: 1 addmul_1 2 addmul_2 3 addmul_3 4 addmul_4 5 addmul_3 + addmul_2 2.3998 6 addmul_4 + addmul_2 7 addmul_4 + addmul_3 ======================================================================== * mpn/arm/README ======================================================================== Copyright 2002, 2012 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions for ARM processors. It has been optimised for Cortex-A9, but the code in the top-level directory should run on all ARM processors at architecture level v4 or later. ======================================================================== * mpn/cray/README ======================================================================== Copyright 2000-2002 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. The code in this directory works for Cray vector systems such as C90, J90, T90 (both the CFP variant and the IEEE variant) and SV1. (For the T3E and T3D systems, see the `alpha' subdirectory at the same level as the directory containing this file.) The cfp subdirectory is for systems utilizing the traditional Cray floating-point format, and the ieee subdirectory is for the newer systems that use the IEEE floating-point format. There are several issues that reduces speed on Cray systems. For systems with cfp floating point, the main obstacle is the forming of 128-bit products. For IEEE systems, adding, and in particular computing carry is the main issue. There are no vectorizing unsigned-less-than instructions, and the sequence that implement that operation is very long. Shifting is the only operation that is simple to make fast. All Cray systems have a bitblt instructions (Vi Vj,VjAk) that should be really useful. For best speed for cfp systems, we need a mul_basecase, since that reduces the need for carry propagation to a minimum. Depending on the size (vn) of the smaller of the two operands (V), we should split U and V in different chunk sizes: U split in 2 32-bit parts V split according to the table: parts 4 5 6 7 8 bits/part 16 13 11 10 8 max allowed vn 1 8 32 64 256 number of multiplies 8 10 12 14 16 peak cycles/limb 4 5 6 7 8 U split in 3 22-bit parts V split according to the table: parts 3 4 5 bits/part 22 16 13 max allowed vn 16 1024 8192 number of multiplies 9 12 15 peak cycles/limb 4.5 6 7.5 U split in 4 16-bit parts V split according to the table: parts 4 bits/part 16 max allowed vn 65536 number of multiplies 16 peak cycles/limb 8 (A T90 CPU can accumulate two products per cycle.) IDEA: * Rewrite mpn_add_n: short cy[n + 1]; #pragma _CRI ivdep for (i = 0; i < n; i++) { s = up[i] + vp[i]; rp[i] = s; cy[i + 1] = s < up[i]; } more_carries = 0; #pragma _CRI ivdep for (i = 1; i < n; i++) { s = rp[i] + cy[i]; rp[i] = s; more_carries += s < cy[i]; } cys = 0; if (more_carries) { cys = rp[1] < cy[1]; for (i = 2; i < n; i++) { rp[i] += cys; cys = rp[i] < cys; } } return cys + cy[n]; * Write mpn_add3_n for adding three operands. First add operands 1 and 2, and generate cy[]. Then add operand 3 to the partial result, and accumulate carry into cy[]. Finally propagate carry just like in the new mpn_add_n. IDEA: Store fewer bits, perhaps 62, per limb. That brings mpn_add_n time down to 2.5 cycles/limb and mpn_addmul_1 times to 4 cycles/limb. By storing even fewer bits per limb, perhaps 56, it would be possible to write a mul_mul_basecase that would run at effectively 1 cycle/limb. (Use VM here to better handle the romb-shaped multiply area, perhaps rounding operand sizes up to the next power of 2.) ======================================================================== * mpn/ia64/README ======================================================================== Copyright 2000-2005 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. IA-64 MPN SUBROUTINES This directory contains mpn functions for the IA-64 architecture. CODE ORGANIZATION mpn/ia64 itanium-2, and generic ia64 The code here has been optimized primarily for Itanium 2. Very few Itanium 1 chips were ever sold, and Itanium 2 is more powerful, so the latter is what we concentrate on. CHIP NOTES The IA-64 ISA keeps instructions three and three in 128 bit bundles. Programmers/compilers need to put explicit breaks `;;' when there are WAW or RAW dependencies, with some notable exceptions. Such "breaks" are typically at the end of a bundle, but can be put between operations within some bundle types too. The Itanium 1 and Itanium 2 implementations can under ideal conditions execute two bundles per cycle. The Itanium 1 allows 4 of these instructions to do integer operations, while the Itanium 2 allows all 6 to be integer operations. Taken cloop branches seem to insert a bubble into the pipeline most of the time on Itanium 1. Loads to the fp registers bypass the L1 cache and thus get extremely long latencies, 9 cycles on the Itanium 1 and 6 cycles on the Itanium 2. The software pipeline stuff using br.ctop instruction causes delays, since many issue slots are taken up by instructions with zero predicates, and since many extra instructions are needed to set things up. These features are clearly designed for code density, not speed. Misc pipeline limitations (Itanium 1): * The getf.sig instruction can only execute in M0. * At most four integer instructions/cycle. * Nops take up resources like any plain instructions. Misc pipeline limitations (Itanium 2): * The getf.sig instruction can only execute in M0. * Nops take up resources like any plain instructions. ASSEMBLY SYNTAX .align pads with nops in a text segment, but gas 2.14 and earlier incorrectly byte-swaps its nop bundle in big endian mode (eg. hpux), making it come out as break instructions. We use the ALIGN() macro in mpn/ia64/ia64-defs.m4 when it might be executed across. That macro suppresses any .align if the problem is detected by configure. Lack of alignment might hurt performance but will at least be correct. foo:: to create a global symbol is not accepted by gas. Use separate ".global foo" and "foo:" instead. .global is the standard global directive. gas accepts .globl, but hpux "as" doesn't. .proc / .endp generates the appropriate .type and .size information for ELF, so the latter directives don't need to be given explicitly. .pred.rel "mutex"... is standard for annotating predicate register relationships. gas also accepts .pred.rel.mutex, but hpux "as" doesn't. .pred directives can't be put on a line with a label, like ".Lfoo: .pred ...", the HP assembler on HP-UX 11.23 rejects that. gas is happy with it, and past versions of HP had seemed ok. // is the standard comment sequence, but we prefer "C" since it inhibits m4 macro expansion. See comments in ia64-defs.m4. REGISTER USAGE Special: r0: constant 0 r1: global pointer (gp) r8: return value r12: stack pointer (sp) r13: thread pointer (tp) Caller-saves: r8-r11 r14-r31 f6-f15 f32-f127 Caller-saves but rotating: r32- ================================================================ mpn_add_n, mpn_sub_n: The current code runs at 1.25 c/l on Itanium 2. ================================================================ mpn_mul_1: The current code runs at 2 c/l on Itanium 2. Using a blocked approach, working off of 4 separate places in the operands, one could make use of the xma accumulation, and approach 1 c/l. ldf8 [up] xma.l xma.hu stf8 [wrp] ================================================================ mpn_addmul_1: The current code runs at 2 c/l on Itanium 2. It seems possible to use a blocked approach, as with mpn_mul_1. We should read rp[] to integer registers, allowing for just one getf.sig per cycle. ld8 [rp] ldf8 [up] xma.l xma.hu getf.sig add+add+cmp+cmp st8 [wrp] These 10 instructions can be scheduled to approach 1.667 cycles, and with the 4 cycle latency of xma, this means we need at least 3 blocks. Using ldfp8 we could approach 1.583 c/l. ================================================================ mpn_submul_1: The current code runs at 2.25 c/l on Itanium 2. Getting to 2 c/l requires ldfp8 with all alignment headache that implies. ================================================================ mpn_addmul_N For best speed, we need to give up using mpn_addmul_2 as the main multiply building block, and instead take multiple v limbs per loop. For the Itanium 1, we need to take about 8 limbs at a time for full speed. For the Itanium 2, something like mpn_addmul_4 should be enough. The add+cmp+cmp+add we use on the other codes is optimal for shortening recurrencies (1 cycle) but the sequence takes up 4 execution slots. When recurrency depth is not critical, a more standard 3-cycle add+cmp+add is better. /* First load the 8 values from v */ ldfp8 v0, v1 = [r35], 16;; ldfp8 v2, v3 = [r35], 16;; ldfp8 v4, v5 = [r35], 16;; ldfp8 v6, v7 = [r35], 16;; /* In the inner loop, get a new U limb and store a result limb. */ mov lc = un Loop: ldf8 u0 = [r33], 8 ld8 r0 = [r32] xma.l lp0 = v0, u0, hp0 xma.hu hp0 = v0, u0, hp0 xma.l lp1 = v1, u0, hp1 xma.hu hp1 = v1, u0, hp1 xma.l lp2 = v2, u0, hp2 xma.hu hp2 = v2, u0, hp2 xma.l lp3 = v3, u0, hp3 xma.hu hp3 = v3, u0, hp3 xma.l lp4 = v4, u0, hp4 xma.hu hp4 = v4, u0, hp4 xma.l lp5 = v5, u0, hp5 xma.hu hp5 = v5, u0, hp5 xma.l lp6 = v6, u0, hp6 xma.hu hp6 = v6, u0, hp6 xma.l lp7 = v7, u0, hp7 xma.hu hp7 = v7, u0, hp7 getf.sig l0 = lp0 getf.sig l1 = lp1 getf.sig l2 = lp2 getf.sig l3 = lp3 getf.sig l4 = lp4 getf.sig l5 = lp5 getf.sig l6 = lp6 add+cmp+add xx, l0, r0 add+cmp+add acc0, acc1, l1 add+cmp+add acc1, acc2, l2 add+cmp+add acc2, acc3, l3 add+cmp+add acc3, acc4, l4 add+cmp+add acc4, acc5, l5 add+cmp+add acc5, acc6, l6 getf.sig acc6 = lp7 st8 [r32] = xx, 8 br.cloop Loop 49 insn at max 6 insn/cycle: 8.167 cycles/limb8 11 memops at max 2 memops/cycle: 5.5 cycles/limb8 16 fpops at max 2 fpops/cycle: 8 cycles/limb8 21 intops at max 4 intops/cycle: 5.25 cycles/limb8 11+21 memops+intops at max 4/cycle 8 cycles/limb8 ================================================================ mpn_lshift, mpn_rshift The current code runs at 1 cycle/limb on Itanium 2. Using 63 separate loops, we could use the double-word shrp instruction. That instruction has a plain single-cycle latency. We need 63 loops since this instruction only accept immediate count. That would lead to a somewhat silly code size, but the speed would be 0.75 c/l on Itanium 2 (by using shrp each cycle plus shl/shr going down I1 for a further limb every second cycle). ================================================================ mpn_copyi, mpn_copyd The current code runs at 0.5 c/l on Itanium 2. But that is just for L1 cache hit. The 4-way unrolled loop takes just 2 cycles, and thus load-use scheduling isn't great. It might be best to actually use modulo scheduled loops, since that will allow us to do better load-use scheduling without too much unrolling. Depending on size or operand alignment, we get 1 c/l or 0.5 c/l on Itanium 2, according to tune/speed. Cache bank conflicts? REFERENCES Intel Itanium Architecture Software Developer's Manual, volumes 1 to 3, Intel document 245317-004, 245318-004, 245319-004 October 2002. Volume 1 includes an Itanium optimization guide. Intel Itanium Processor-specific Application Binary Interface (ABI), Intel document 245370-003, May 2001. Describes C type sizes, dynamic linking, etc. Intel Itanium Architecture Assembly Language Reference Guide, Intel document 248801-004, 2000-2002. Describes assembly instruction syntax and other directives. Itanium Software Conventions and Runtime Architecture Guide, Intel document 245358-003, May 2001. Describes calling conventions, including stack unwinding requirements. Intel Itanium Processor Reference Manual for Software Optimization, Intel document 245473-003, November 2001. Intel Itanium-2 Processor Reference Manual for Software Development and Optimization, Intel document 251110-003, May 2004. All the above documents can be found online at http://developer.intel.com/design/itanium/manuals.htm ======================================================================== * mpn/m68k/README ======================================================================== Copyright 2001, 2003, 2004 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. M68K MPN SUBROUTINES This directory contains mpn functions for various m68k family chips. CODE ORGANIZATION m68k m68000, m68010, m68060 m68k/mc68020 m68020, m68030, m68040, and CPU32 The m5200 "coldfire", which is m68000 less a few instructions, currently has no assembler code support. STATUS The code herein is old and poorly maintained. If somebody really cared, it could be optimized substantially. For example, * mpn_add_n and mpn_sub_n could, with more unrolling be improved from 6 to close to 4 c/l (on m68040). * The multiplication loops could be sped up by using the FPU. * mpn_lshift by 31 should use the special-case mpn_rshift by 1 code, and vice versa mpn_rshift by 31 use the special lshift by 1, when operand overlap permits. * On 68000, mpn_mul_1, mpn_addmul_1 and mpn_submul_1 could check for a 16-bit multiplier and use two multiplies per limb, not four. Similarly various other _1 operations like mpn_mod_1, mpn_divrem_1, mpn_divexact_1, mpn_modexact_1c_odd. * On 68000, mpn_lshift and mpn_rshift could use a roll and mask instead of lsrl and lsll. This promises to be a speedup, effectively trading a 6+2*n shift for one or two 4 cycle masks. Suggested by Jean-Charles Meyrignac. * config.guess detects 68000, 68010, CPU32 and 68020 by running some code, but relies on system information for 030, 040 and 060. Can they be identified by running some code? Currently this only makes a difference to the compiler options selected, since we have no specific asm code for those chips. One novel idea for 68000 would be to use a 16-bit limb instead of 32-bits. This would suit the native 16x16 multiply, but might make it difficult to get full value from the native 32x32 add/sub/etc. This would be an ABI option, and would select "__GMP_SHORT_LIMB" in gmp.h. Naturally an entirely new set of asm subroutines would be needed for a 16-bit limb. Also there's various places in the C code assuming limb>=long, which would need to be updated, eg. mpz_set_ui. Some of the nails changes may have helped cover some of this. ASM FILES The .asm files are put through m4 for macro processing, and with the help of configure give either MIT or Motorola syntax. The generic mpn/asm-defs.m4 is used, together with mpn/m68k/m68k-defs.m4. See comments in those files. Not all possible syntax variations are covered. GCC config/m68k for instance has things like $ for immediates on CRDS or reversed cmp order for AT&T SGS. These could probably be handled if anyone really needs it. CALLING CONVENTIONS The SVR4 standard has an int of 32 bits, and all parameters 32-bit aligned on the stack. PalmOS and perhaps various embedded systems intended for 68000 however use an int of 16 bits and parameters only 16-bit aligned on the stack. This is generated by "gcc -mshort" (and is the default for the PalmOS gcc port, we believe). The asm files adapt to these two ABIs by checking sizeof(unsigned), coming through config.m4 as SIZEOF_UNSIGNED. Only mpn_lshift and mpn_rshift are affected, all other routines take longs and pointers, which are 32-bits in both cases. Strictly speaking the size of an int doesn't determine the stack padding convention. But if int is 16 bits then we can definitely say the host system is not SVR4, and therefore may as well assume we're in 16-bit stack alignment. REFERENCES "Motorola M68000 Family Programmer's Reference Manual", available online, http://e-www.motorola.com/brdata/PDFDB/docs/M68000PM.pdf "System V Application Binary Interface: Motorola 68000 Processor Family Supplement", AT&T, 1990, ISBN 0-13-877553-6. Has details of calling conventions and ELF style PIC coding. ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/m88k/README ======================================================================== Copyright 2003 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. M88K MPN SUBROUTINES This directory contains mpn functions for various m88k family chips. CODE ORGANIZATION m88k m88000, m88100 m88k/mc88110 m88110 STATUS The code herein is old and poorly maintained. * The .s files assume the system uses a "_" underscore prefix, which should be controlled by configure. * The mc88110/*.S files are using the defunct "sysdep.h" configuration scheme and won't compile. Conversion to the current m4 .asm style wouldn't be difficult. ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/mips64/README ======================================================================== Copyright 1996 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions optimized for MIPS3. Example of processors that implement MIPS3 are R4000, R4400, R4600, R4700, and R8000. RELEVANT OPTIMIZATION ISSUES 1. On the R4000 and R4400, branches, both the plain and the "likely" ones, take 3 cycles to execute. (The fastest possible loop will take 4 cycles, because of the delay insn.) On the R4600, branches takes a single cycle On the R8000, branches often take no noticeable cycles, as they are executed in a separate function unit.. 2. The R4000 and R4400 have a load latency of 4 cycles. 3. On the R4000 and R4400, multiplies take a data-dependent number of cycles, contrary to the SGI documentation. There seem to be 3 or 4 possible latencies. 4. The R1x000 processors can issue one floating-point operation, two integer operations, and one memory operation per cycle. The FPU has very short latencies, while the integer multiply unit is non-pipelined. We should therefore write fp based mpn_Xmul_1. STATUS Good... ======================================================================== * mpn/pa32/README ======================================================================== Copyright 1996, 1999, 2001, 2002, 2004 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions for various HP PA-RISC chips. Code that runs faster on the PA7100 and later implementations, is in the pa7100 directory. RELEVANT OPTIMIZATION ISSUES Load and Store timing On the PA7000 no memory instructions can issue the two cycles after a store. For the PA7100, this is reduced to one cycle. The PA7100 has a lookup-free cache, so it helps to schedule loads and the dependent instruction really far from each other. STATUS 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the instructions below (but some sw pipelining is needed to avoid the xmpyu-fstds delay): fldds s1_ptr xmpyu fstds N(%r30) xmpyu fstds N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) addc stws res_ptr addc stws res_ptr addib Loop 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb (asymptotically) on the PA7100, using the instructions below. With proper sw pipelining and the unrolling level below, the speed becomes 8 cycles/limb. fldds s1_ptr fldds s1_ptr xmpyu fstds N(%r30) xmpyu fstds N(%r30) xmpyu fstds N(%r30) xmpyu fstds N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) ldws N(%r30) addc addc addc addc addc %r0,%r0,cy-limb ldws res_ptr ldws res_ptr ldws res_ptr ldws res_ptr add stws res_ptr addc stws res_ptr addc stws res_ptr addc stws res_ptr addib 3. For the PA8000 we have to stick to using 32-bit limbs before compiler support emerges. But we want to use 64-bit operations whenever possible, in particular for loads and stores. It is possible to handle mpn_add_n efficiently by rotating (when s1/s2 are aligned), masking+bit field inserting when (they are not). The speed should double compared to the code used today. LABEL SYNTAX The HP-UX assembler takes labels starting in column 0 with no colon, L$loop ldws,mb -4(0,%r25),%r22 Gas on hppa GNU/Linux however requires a colon, L$loop: ldws,mb -4(0,%r25),%r22 This is covered by using LDEF() from asm-defs.m4. An alternative would be to use ".label" which is accepted by both, .label L$loop ldws,mb -4(0,%r25),%r22 but that's not as nice to look at, not if you're used to assembler code having labels in column 0. REFERENCES Hewlett Packard, "HP Assembler Reference Manual", 9th edition, June 1998, part number 92432-90012. ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/pa64/README ======================================================================== Copyright 1999, 2001, 2002, 2004 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions for 64-bit PA-RISC 2.0. PIPELINE SUMMARY The PA8x00 processors have an orthogonal 4-way out-of-order pipeline. Each cycle two ALU operations and two MEM operations can issue, but just one of the MEM operations may be a store. The two ALU operations can be almost any combination of non-memory operations. Unlike every other processor, integer and fp operations are completely equal here; they both count as just ALU operations. Unfortunately, some operations cause hickups in the pipeline. Combining carry-consuming operations like ADD,DC with operations that does not set carry like ADD,L cause long delays. Skip operations also seem to cause hickups. If several ADD,DC are issued consecutively, or if plain carry-generating ADD feed ADD,DC, stalling does not occur. We can effectively issue two ADD,DC operations/cycle. Latency scheduling is not as important as making sure to have a mix of ALU and MEM operations, but for full pipeline utilization, it is still a good idea to do some amount of latency scheduling. Like for all other processors, RAW memory scheduling is critically important. Since integer multiplication takes place in the floating-point unit, the GMP code needs to handle this problem frequently. STATUS * mpn_lshift and mpn_rshift run at 1.5 cycles/limb on PA8000 and at 1.0 cycles/limb on PA8500. With latency scheduling, the numbers could probably be improved to 1.0 cycles/limb for all PA8x00 chips. * mpn_add_n and mpn_sub_n run at 2.0 cycles/limb on PA8000 and at about 1.6875 cycles/limb on PA8500. With latency scheduling, this could probably be improved to get close to 1.5 cycles/limb. A problem is the stalling of carry-inputting instructions after instructions that do not write to carry. * mpn_mul_1, mpn_addmul_1, and mpn_submul_1 run at between 5.625 and 6.375 on PA8500 and later, and about a cycle/limb slower on older chips. The code uses ADD,DC for adjacent limbs, and relies heavily on reordering. REFERENCES Hewlett Packard, "64-Bit Runtime Architecture for PA-RISC 2.0", version 3.3, October 1997. ======================================================================== * mpn/powerpc32/README ======================================================================== Copyright 2002, 2005 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. POWERPC 32-BIT MPN SUBROUTINES This directory contains mpn functions for various 32-bit PowerPC chips. CODE ORGANIZATION directory used for ================================================ powerpc generic, 604, 604e, 744x, 745x powerpc/750 740, 750, 7400, 7410 The top-level powerpc directory is currently mostly aimed at 604/604e but should be reasonable on all powerpcs. STATUS The code is quite well optimized for the 604e, other chips have had less attention. Altivec SIMD available in 74xx might hold some promise, but unfortunately GMP only guarantees 32-bit data alignment, so there's lots of fiddling around with partial operations at the start and end of limb vectors. A 128-bit limb would be a novel idea, but is unlikely to be practical, since it would have to work with ordinary +, -, * etc in the C code. Also, Altivec isn't very well suited for the GMP multiplication needs. Using floating-point based multiplication has much better better performance potential for all current powerpcs, both the ones with slow integer multiply units (603, 740, 750, 7400, 7410) and those with fast (604, 604e, 744x, 745x). This is because all powerpcs do some level of pipelining in the FPU: 603 and 750 can sustain one fmadd every 2nd cycle. 604 and 604e can sustain one fmadd per cycle. 7400 and 7410 can sustain 3 fmadd in 4 cycles. 744x and 745x can sustain 4 fmadd in 5 cycles. REGISTER NAMES The normal powerpc convention is to give registers as plain numbers, like "mtctr 6", but on Apple MacOS X (powerpc*-*-rhapsody* and powerpc*-*-darwin*) the assembler demands an "r" like "mtctr r6". Note however when register 0 in an instruction means a literal zero the "r" is omitted, for instance "lwzx r6,0,r7". The GMP code uses the "r" forms, powerpc-defs.m4 transforms them to plain numbers according to what GMP_ASM_POWERPC_R_REGISTERS finds is needed. (Note that this style isn't fully general, as the identifier r4 and the register r4 will not be distinguishable on some systems. However, this is not a problem for the limited GMP assembly usage.) GLOBAL REFERENCES Linux non-PIC lis 9, __gmp_binvert_limb_table@ha rlwinm 11, 5, 31, 25, 31 la 9, __gmp_binvert_limb_table@l(9) lbzx 11, 9, 11 Linux PIC (FIXME) .LCL0: .long .LCTOC1-.LCF0 bcl 20, 31, .LCF0 .LCF0: mflr 30 lwz 7, .LCL0-.LCF0(30) add 30, 7, 30 lwz 11, .LC0-.LCTOC1(30) rlwinm 3, 5, 31, 25, 31 lbzx 7, 11, 3 AIX (always PIC) LC..0: .tc __gmp_binvert_limb_table[TC],__gmp_binvert_limb_table[RW] lwz 9, LC..0(2) rlwinm 0, 5, 31, 25, 31 lbzx 0, 9, 0 Darwin (non-PIC) lis r2, ha16(___gmp_binvert_limb_table) rlwinm r9, r5, 31, 25, 31 la r2, lo16(___gmp_binvert_limb_table)(r2) lbzx r0, r2, r9 Darwin (PIC) mflr r0 bcl 20, 31, L0001$pb L0001$pb: mflr r7 mtlr r0 addis r2, r7, ha16(L___gmp_binvert_limb_table$non_lazy_ptr-L0001$pb) rlwinm r9, r5, 31, 25, 31 lwz r2, lo16(L___gmp_binvert_limb_table$non_lazy_ptr-L0001$pb)(r2) lbzx r0, r2, r9 ------ .non_lazy_symbol_pointer L___gmp_binvert_limb_table$non_lazy_ptr: .indirect_symbol ___gmp_binvert_limb_table .long 0 .subsections_via_symbols For GNU/Linux and Darwin, we might want to duplicate __gmp_binvert_limb_table into the text section in this file. We should thus be able to reach it like this: blr L0 L0: mflr r2 rlwinm r9, r5, 31, 25, 31 addi r9, r9, lo16(local_binvert_table-L0) lbzx r0, r2, r9 REFERENCES PowerPC Microprocessor Family: The Programming Environments for 32-bit Microprocessors, IBM document G522-0290-01, 2000. PowerPC 604e RISC Microprocessor User's Manual with Supplement for PowerPC 604 Microprocessor, IBM document G552-0330-00, Freescale document MPC604EUM/AD, 3/1998. MPC7410/MPC7400 RISC Microprocessor User's Manual, Freescale document MPC7400UM/D, rev 1, 11/2002. MPC7450 RISC Microprocessor Family Reference Manual, Freescale document MPC7450UM, rev 5, 1/2005. The above are available online from http://www.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC http://www.freescale.com/PowerPC ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/powerpc64/README ======================================================================== Copyright 1999-2001, 2003-2005 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. POWERPC-64 MPN SUBROUTINES This directory contains mpn functions for 64-bit PowerPC chips. CODE ORGANIZATION mpn/powerpc64 mode-neutral code mpn/powerpc64/mode32 code for mode32 mpn/powerpc64/mode64 code for mode64 The mode32 and mode64 sub-directories contain code which is for use in the respective chip mode, 32 or 64. The top-level directory is code that's unaffected by the mode. The "adde" instruction is the main difference between mode32 and mode64. It operates on either on a 32-bit or 64-bit quantity according to the chip mode. Other instructions have an operand size in their opcode and hence don't vary. POWER3/PPC630 pipeline information: Decoding is 4-way + branch and issue is 8-way with some out-of-order capability. Functional units: LS1 - ld/st unit 1 LS2 - ld/st unit 2 FXU1 - integer unit 1, handles any simple integer instruction FXU2 - integer unit 2, handles any simple integer instruction FXU3 - integer unit 3, handles integer multiply and divide FPU1 - floating-point unit 1 FPU2 - floating-point unit 2 Memory: Any two memory operations can issue, but memory subsystem can sustain just one store per cycle. No need for data prefetch; the hardware has very sophisticated prefetch logic. Simple integer: 2 operations (such as add, rl*) Integer multiply: 1 operation every 9th cycle worst case; exact timing depends on 2nd operand's most significant bit position (10 bits per cycle). Multiply unit is not pipelined, only one multiply operation in progress is allowed. Integer divide: ? Floating-point: Any plain 2 arithmetic instructions (such as fmul, fadd, and fmadd), latency 4 cycles. Floating-point divide: ? Floating-point square root: ? POWER3/PPC630 best possible times for the main loops: shift: 1.5 cycles limited by integer unit contention. With 63 special loops, one for each shift count, we could reduce the needed integer instructions to 2, which would reduce the best possible time to 1 cycle. add/sub: 1.5 cycles, limited by ld/st unit contention. mul: 18 cycles (average) unless floating-point operations are used, but that would only help for multiplies of perhaps 10 and more limbs. addmul/submul:Same situation as for mul. POWER4/PPC970 and POWER5 pipeline information: This is a very odd pipeline, it is basically a VLIW masquerading as a plain architecture. Its issue rules are not made public, and since it is so weird, it is very hard to figure out any useful information from experimentation. An example: A well-aligned loop with nop's take 3, 4, 6, 7, ... cycles. 3 cycles for 0, 1, 2, 3, 4, 5, 6, 7 nop's 4 cycles for 8, 9, 10, 11, 12, 13, 14, 15 nop's 6 cycles for 16, 17, 18, 19, 20, 21, 22, 23 nop's 7 cycles for 24, 25, 26, 27 nop's 8 cycles for 28, 29, 30, 31 nop's ... continues regularly Functional units: LS1 - ld/st unit 1 LS2 - ld/st unit 2 FXU1 - integer unit 1, handles any integer instruction FXU2 - integer unit 2, handles any integer instruction FPU1 - floating-point unit 1 FPU2 - floating-point unit 2 While this is one integer unit less than POWER3/PPC630, the remaining units are more powerful; here they handle multiply and divide. Memory: 2 ld/st. Stores go to the L2 cache, which can sustain just one store per cycle. L1 load latency: to gregs 3-4 cycles, to fregs 5-6 cycles. Operations that modify the address register might be split to use also an integer issue slot. Simple integer: 2 operations every cycle, latency 2. Integer multiply: 2 operations every 6th cycle, latency 7 cycles. Integer divide: ? Floating-point: Any plain 2 arithmetic instructions (such as fmul, fadd, and fmadd), latency 6 cycles. Floating-point divide: ? Floating-point square root: ? IDEAS *mul_1: Handling one limb using mulld/mulhdu and two limbs using floating- point operations should give performance of about 20 cycles for 3 limbs, or 7 cycles/limb. We should probably split the single-limb operand in 32-bit chunks, and the multi-limb operand in 16-bit chunks, allowing us to accumulate well in fp registers. Problem is to get 32-bit or 16-bit words to the fp registers. Only 64-bit fp memops copies bits without fiddling with them. We might therefore need to load to integer registers with zero extension, store as 64 bits into temp space, and then load to fp regs. Alternatively, load directly to fp space and add well-chosen constants to get cancellation. (Other part after given by subsequent subtraction.) Possible code mix for load-via-intregs variant: lwz,std,lfd fmadd,fmadd,fmul,fmul fctidz,stfd,ld,fctidz,stfd,ld add,adde lwz,std,lfd fmadd,fmadd,fmul,fmul fctidz,stfd,ld,fctidz,stfd,ld add,adde srd,sld,add,adde,add,adde ======================================================================== * mpn/s390_64/README ======================================================================== Copyright 2011 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. There are 5 generations of 64-but s390 processors, z900, z990, z9, z10, and z196. The current GMP code was optimised for the two oldest, z900 and z990. mpn_copyi This code makes use of a loop around MVC. It almost surely runs very close to optimally. A small improvement could be done by using one MVC for size 256 bytes, now we use two (we use an extra MVC when copying any multiple of 256 bytes). mpn_copyd We have tried several feed-in variants here, branch tree, jump table and computed goto. The fastest (on z990) turned out to be computed goto. An approach not tried is EX of LMG and STMG, modifying the register set on-the-fly. Using that trick, we could completely avoid using separate feed-in paths. mpn_lshift, mpn_rshift The current code runs at pipeline decode bandwith on z990. mpn_add_n, mpn_sub_n The current code is 4-way unrolled. It should be unrolled more, at least 8x, in order to reach 2.5 c/l. mpn_mul_1, mpn_addmul_1, mpn_submul_1 The current code is very naive, but due to the non-pipelined nature of MLGR on z900 and z990, more sophisticated code would not gain much. On z10 one would need to cluster at least 4 MLGR together, in order to reduce stalling. On z196, one surely want to use unrolling and pipelining, to perhaps reach around 12 c/l. A major issue here and on z10 is ALCGR's 3 cycle stalling. mpn_mul_2, mpn_addmul_2 At least for older machines (z900, z990) with very slow MLGR, we should use Karatsuba's algorithm on 2-limb units, making mul_2 and addmul_2 the main multiplication primitives. The newer machines might benefit less from this approach, perhaps in particular z10, where MLGR clustering is more important. With Karatsuba, one could hope for around 16 cycles per accumulated 128 cross product, on z990. ======================================================================== * mpn/sparc32/README ======================================================================== Copyright 1996, 2001 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions for various SPARC chips. Code that runs only on version 8 SPARC implementations, is in the v8 subdirectory. RELEVANT OPTIMIZATION ISSUES Load and Store timing On most early SPARC implementations, the ST instructions takes multiple cycles, while a STD takes just a single cycle more than an ST. For the CPUs in SPARCstation I and II, the times are 3 and 4 cycles, respectively. Therefore, combining two ST instructions into a STD when possible is a significant optimization. Later SPARC implementations have single cycle ST. For SuperSPARC, we can perform just one memory instruction per cycle, even if up to two integer instructions can be executed in its pipeline. For programs that perform so many memory operations that there are not enough non-memory operations to issue in parallel with all memory operations, using LDD and STD when possible helps. UltraSPARC-1/2 has very slow integer multiplication. In the v9 subdirectory, we therefore use floating-point multiplication. STATUS 1. On a SuperSPARC, mpn_lshift and mpn_rshift run at 3 cycles/limb, or 2.5 cycles/limb asymptotically. We could optimize speed for special counts by using ADDXCC. 2. On a SuperSPARC, mpn_add_n and mpn_sub_n runs at 2.5 cycles/limb, or 2 cycles/limb asymptotically. 3. mpn_mul_1 runs at what is believed to be optimal speed. 4. On SuperSPARC, mpn_addmul_1 and mpn_submul_1 could both be improved by a cycle by avoiding one of the add instructions. See a29k/addmul_1. The speed of the code for other SPARC implementations is uncertain. ======================================================================== * mpn/sparc64/README ======================================================================== Copyright 1997, 1999-2002 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. This directory contains mpn functions for 64-bit V9 SPARC RELEVANT OPTIMIZATION ISSUES Notation: IANY = shift/add/sub/logical/sethi IADDLOG = add/sub/logical/sethi MEM = ld*/st* FA = fadd*/fsub*/f*to*/fmov* FM = fmul* UltraSPARC can issue four instructions per cycle, with these restrictions: * Two IANY instructions, but only one of these may be a shift. If there is a shift and an IANY instruction, the shift must precede the IANY instruction. * One FA. * One FM. * One branch. * One MEM. * IANY/IADDLOG/MEM must be insn 1, 2, or 3 in an issue bundle. Taken branches should not be in slot 4, since that makes the delay insn come from separate bundle. * If two IANY/IADDLOG instructions are to be executed in the same cycle and one of these is setting the condition codes, that instruction must be the second one. To summarize, ignoring branches, these are the bundles that can reach the peak execution speed: insn1 iany iany mem iany iany mem iany iany mem insn2 iaddlog mem iany mem iaddlog iany mem iaddlog iany insn3 mem iaddlog iaddlog fa fa fa fm fm fm insn4 fa/fm fa/fm fa/fm fm fm fm fa fa fa The 64-bit integer multiply instruction mulx takes from 5 cycles to 35 cycles, depending on the position of the most significant bit of the first source operand. When used for 32x32->64 multiplication, it needs 20 cycles. Furthermore, it stalls the processor while executing. We stay away from that instruction, and instead use floating-point operations. Floating-point add and multiply units are fully pipelined. The latency for UltraSPARC-1/2 is 3 cycles and for UltraSPARC-3 it is 4 cycles. Integer conditional move instructions cannot dual-issue with other integer instructions. No conditional move can issue 1-5 cycles after a load. (This might have been fixed for UltraSPARC-3.) The UltraSPARC-3 pipeline is very simular to the one of UltraSPARC-1/2 , but is somewhat slower. Branches execute slower, and there may be other new stalls. But integer multiply doesn't stall the entire CPU and also has a much lower latency. But it's still not pipelined, and thus useless for our needs. STATUS * mpn_lshift, mpn_rshift: The current code runs at 2.0 cycles/limb on UltraSPARC-1/2 and 2.65 on UltraSPARC-3. For UltraSPARC-1/2, the IEU0 functional unit is saturated with shifts. * mpn_add_n, mpn_sub_n: The current code runs at 4 cycles/limb on UltraSPARC-1/2 and 4.5 cycles/limb on UltraSPARC-3. The 4 instruction recurrency is the speed limiter. * mpn_addmul_1: The current code runs at 14 cycles/limb asymptotically on UltraSPARC-1/2 and 17.5 cycles/limb on UltraSPARC-3. On UltraSPARC-1/2, the code sustains 4 instructions/cycle. It might be possible to invent a better way of summing the intermediate 49-bit operands, but it is unlikely that it will save enough instructions to save an entire cycle. The load-use of the u operand is not enough scheduled for good L2 cache performance. The UltraSPARC-1/2 L1 cache is direct mapped, and since we use temporary stack slots that will conflict with the u and r operands, we miss to L2 very often. The load-use of the std/ldx pairs via the stack are perhaps over-scheduled. It would be possible to save two instructions: (1) The mov could be avoided if the std/ldx were less scheduled. (2) The ldx of the r operand could be split into two ld instructions, saving the shifts/masks. It should be possible to reach 14 cycles/limb for UltraSPARC-3 if the fp operations where rescheduled for this processor's 4-cycle latency. * mpn_mul_1: The current code is a straightforward edit of the mpn_addmul_1 code. It would be possible to shave one or two cycles from it, with some labour. * mpn_submul_1: Simpleminded code just calling mpn_mul_1 + mpn_sub_n. This means that it runs at 18 cycles/limb on UltraSPARC-1/2 and 23 cycles/limb on UltraSPARC-3. It would be possible to either match the mpn_addmul_1 performance, or in the worst case use one more instruction group. * US1/US2 cache conflict resolving. The direct mapped L1 date cache of US1/US2 is a problem for mul_1, addmul_1 (and a prospective submul_1). We should allocate a larger cache area, and put the stack temp area in a place that doesn't cause cache conflicts. ======================================================================== * mpn/x86/README ======================================================================== Copyright 1999-2002 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. X86 MPN SUBROUTINES This directory contains mpn functions for various 80x86 chips. CODE ORGANIZATION x86 i386, generic x86/i486 i486 x86/pentium Intel Pentium (P5, P54) x86/pentium/mmx Intel Pentium with MMX (P55) x86/p6 Intel Pentium Pro x86/p6/mmx Intel Pentium II, III x86/p6/p3mmx Intel Pentium III x86/k6 \ AMD K6 x86/k6/mmx / x86/k6/k62mmx AMD K6-2 x86/k7 \ AMD Athlon x86/k7/mmx / x86/pentium4 \ x86/pentium4/mmx | Intel Pentium 4 x86/pentium4/sse2 / The top-level x86 directory contains blended style code, meant to be reasonable on all x86s. STATUS The code is well-optimized for AMD and Intel chips, but there's nothing specific for Cyrix chips, nor for actual 80386 and 80486 chips. ASM FILES The x86 .asm files are BSD style assembler code, first put through m4 for macro processing. The generic mpn/asm-defs.m4 is used, together with mpn/x86/x86-defs.m4. See comments in those files. The code is meant for use with GNU "gas" or a system "as". There's no support for assemblers that demand Intel style code. STACK FRAME m4 macros are used to define the parameters passed on the stack, and these act like comments on what the stack frame looks like too. For example, mpn_mul_1() has the following. defframe(PARAM_MULTIPLIER, 16) defframe(PARAM_SIZE, 12) defframe(PARAM_SRC, 8) defframe(PARAM_DST, 4) PARAM_MULTIPLIER becomes `FRAME+16(%esp)', and the others similarly. The return address is at offset 0, but there's not normally any need to access that. FRAME is redefined as necessary through the code so it's the number of bytes pushed on the stack, and hence the offsets in the parameter macros stay correct. At the start of a routine FRAME should be zero. deflit(`FRAME',0) ... deflit(`FRAME',4) ... deflit(`FRAME',8) ... Helper macros FRAME_pushl(), FRAME_popl(), FRAME_addl_esp() and FRAME_subl_esp() exist to adjust FRAME for the effect of those instructions, and can be used instead of explicit definitions if preferred. defframe_pushl() is a combination FRAME_pushl() and defframe(). There's generally some slackness in redefining FRAME. If new values aren't going to get used then the redefinitions are omitted to keep from cluttering up the code. This happens for instance at the end of a routine, where there might be just four pops and then a ret, so FRAME isn't getting used. Local variables and saved registers can be similarly defined, with negative offsets representing stack space below the initial stack pointer. For example, defframe(SAVE_ESI, -4) defframe(SAVE_EDI, -8) defframe(VAR_COUNTER,-12) deflit(STACK_SPACE, 12) Here STACK_SPACE gets used in a "subl $STACK_SPACE, %esp" to allocate the space, and that instruction must be followed by a redefinition of FRAME (setting it equal to STACK_SPACE) to reflect the change in %esp. Definitions for pushed registers are only put in when they're going to be used. If registers are just saved and restored with pushes and pops then definitions aren't made. ASSEMBLER EXPRESSIONS Only addition and subtraction seem to be universally available, certainly that's all the Solaris 8 "as" seems to accept. If expressions are wanted then m4 eval() should be used. In particular note that a "/" anywhere in a line starts a comment in Solaris "as", and in some configurations of gas too. addl $32/2, %eax <-- wrong addl $eval(32/2), %eax <-- right Binutils gas/config/tc-i386.c has a choice between "/" being a comment anywhere in a line, or only at the start. FreeBSD patches 2.9.1 to select the latter, and from 2.9.5 it's the default for GNU/Linux too. ASSEMBLER COMMENTS Solaris "as" doesn't support "#" commenting, using /* */ instead. For that reason "C" commenting is used (see asm-defs.m4) and the intermediate ".s" files have no comments. Any comments before include(`../config.m4') must use m4 "dnl", since it's only after the include that "C" is available. By convention "dnl" is also used for comments about m4 macros. TEMPORARY LABELS Temporary numbered labels like "1:" used as "1f" or "1b" are available in "gas" and Solaris "as", but not in SCO "as". Normal L() labels should be used instead, possibly with a counter to make them unique, see jadcl0() in x86-defs.m4 for instance. A separate counter for each macro makes it possible to nest them, for instance movl_text_address() can be used within an ASSERT(). "1:" etc must be avoided in gcc __asm__ blocks too. "%=" for generating a unique number looks like a good alternative, but is that actually a documented feature? In any case this problem doesn't currently arise. ZERO DISPLACEMENTS In a couple of places addressing modes like 0(%ebx) with a byte-sized zero displacement are wanted, rather than (%ebx) with no displacement. These are either for computed jumps or to get desirable code alignment. Explicit .byte sequences are used to ensure the assembler doesn't turn 0(%ebx) into (%ebx). The Zdisp() macro in x86-defs.m4 is used for this. Current gas 2.9.5 or recent 2.9.1 leave 0(%ebx) as written, but old gas 1.92.3 changes it. In general changing would be the sort of "optimization" an assembler might perform, hence explicit ".byte"s are used where necessary. SHLD/SHRD INSTRUCTIONS The %cl count forms of double shift instructions like "shldl %cl,%eax,%ebx" must be written "shldl %eax,%ebx" for some assemblers. gas takes either, Solaris "as" doesn't allow %cl, gcc generates %cl for gas and NeXT (which is gas), and omits %cl elsewhere. For GMP an autoconf test GMP_ASM_X86_SHLDL_CL is used to determine whether %cl should be used, and the macros shldl, shrdl, shldw and shrdw in mpn/x86/x86-defs.m4 pass through or omit %cl as necessary. See the comments with those macros for usage. IMUL INSTRUCTION GCC config/i386/i386.md (cvs rev 1.187, 21 Oct 00) under *mulsi3_1 notes that the following two forms produce identical object code imul $12, %eax imul $12, %eax, %eax but that the former isn't accepted by some assemblers, in particular the SCO OSR5 COFF assembler. GMP follows GCC and uses only the latter form. (This applies only to immediate operands, the three operand form is only valid with an immediate.) DIRECTION FLAG The x86 calling conventions say that the direction flag should be clear at function entry and exit. (See iBCS2 and SVR4 ABI books, references below.) Although this has been so since the year dot, it's not absolutely clear whether it's universally respected. Since it's better to be safe than sorry, GMP follows glibc and does a "cld" if it depends on the direction flag being clear. This happens only in a few places. POSITION INDEPENDENT CODE Coding Style Defining the symbol PIC in m4 processing selects SVR4 / ELF style position independent code. This is necessary for shared libraries because they can be mapped into different processes at different virtual addresses. Actually, relocations are allowed but text pages with relocations aren't shared, defeating the purpose of a shared library. The GOT is used to access global data, and the PLT is used for functions. The use of the PLT adds a fixed cost to every function call, and the GOT adds a cost to any function accessing global variables. These are small but might be noticeable when working with small operands. Scope It's intended, as a matter of policy, that references within libgmp are resolved within libgmp. Certainly there's no need for an application to replace any internals, and we take the view that there's no value in an application subverting anything documented either. Resolving references within libgmp in theory means calls can be made with a plain PC-relative call instruction, which is faster and smaller than going through the PLT, and data references can be similarly PC-relative, saving a GOT entry and fetch from there. Unfortunately the normal linker behaviour doesn't allow us to do this. By default an R_386_PC32 PC-relative reference, either for a call or for data, is left in libgmp.so by the linker so that it can be resolved at runtime to a location in the application or another shared library. This means a text segment relocation which we don't want. -Bsymbolic Under the "-Bsymbolic" option, the linker resolves references to symbols within libgmp.so. This gives us the desired effect for R_386_PC32, ie. it's resolved at link time. It also resolves R_386_PLT32 calls directly to their target without creating a PLT entry (though if this is done to normal compiler-generated code it still leaves a setup of %ebx to _GLOBAL_OFFSET_TABLE_ which may then be unnecessary). Unfortunately -Bsymbolic does bad things to global variables defined in a shared library but accessed by non-PIC code from the mainline (or a static library). The problem is that the mainline needs a fixed data address to avoid text segment relocations, so space is allocated in its data segment and the value from the variable is copied from the shared library's data segment when the library is loaded. Under -Bsymbolic, however, references in the shared library are then resolved still to the shared library data area. Not surprisingly it bombs badly to have mainline code and library code accessing different locations for what should be one variable. Note that this -Bsymbolic effect for the shared library is not just for R_386_PC32 offsets which might have been cooked up in assembler, but is done also for the contents of GOT entries. -Bsymbolic simply applies a general rule that symbols are resolved first from the local module. Visibility Attributes GCC __attribute__ ((visibility ("protected"))), which is available in recent versions, eg. 3.3, is probably what we'd like to use. It makes gcc generate plain PC-relative calls to indicated functions, and directs the linker to resolve references to the given function within the link module. Unfortunately, as of debian binutils 2.13.90.0.16 at least, the resulting libgmp.so comes out with text segment relocations, references are not resolved at link time. If the gcc description is to be believed this is this not how it should work. If a symbol cannot be overridden by another module then surely references within that module can be resolved immediately (ie. at link time). Present In any case, all this means that we have no optimizations we can usefully make to function or variable usages, neither for assembler nor C code. Perhaps in the future the visibility attribute will work as we'd like. GLOBAL OFFSET TABLE The magic _GLOBAL_OFFSET_TABLE_ used by code establishing the address of the GOT sometimes requires an extra underscore prefix. SVR4 systems and NetBSD don't need a prefix, OpenBSD does need one. Note that NetBSD and OpenBSD are both a.out underscore systems, so the prefix for _GLOBAL_OFFSET_TABLE_ is not simply the same as the prefix for ordinary globals. In any case in the asm code we write _GLOBAL_OFFSET_TABLE_ and let a macro in x86-defs.m4 add an extra underscore if required (according to a configure test). Old gas 1.92.3 which comes with FreeBSD 2.2.8 gets a segmentation fault when asked to assemble the following, L1: addl $_GLOBAL_OFFSET_TABLE_+[.-L1], %ebx It seems that using the label in the same instruction it refers to is the problem, since a nop in between works. But the simplest workaround is to follow gcc and omit the +[.-L1] since it does nothing, addl $_GLOBAL_OFFSET_TABLE_, %ebx Current gas 2.10 generates incorrect object code when %eax is used in such a construction (with or without +[.-L1]), addl $_GLOBAL_OFFSET_TABLE_, %eax The R_386_GOTPC gets a displacement of 2 rather than the 1 appropriate for the 1 byte opcode of "addl $n,%eax". The best workaround is just to use any other register, since then it's a two byte opcode+mod/rm. GCC for example always uses %ebx (which is needed for calls through the PLT). A similar problem occurs in an leal (again with or without a +[.-L1]), leal _GLOBAL_OFFSET_TABLE_(%edi), %ebx This time the R_386_GOTPC gets a displacement of 0 rather than the 2 appropriate for the opcode and mod/rm, making this form unusable. SIMPLE LOOPS The overheads in setting up for an unrolled loop can mean that at small sizes a simple loop is faster. Making small sizes go fast is important, even if it adds a cycle or two to bigger sizes. To this end various routines choose between a simple loop and an unrolled loop according to operand size. The path to the simple loop, or to special case code for small sizes, is always as fast as possible. Adding a simple loop requires a conditional jump to choose between the simple and unrolled code. The size of a branch misprediction penalty affects whether a simple loop is worthwhile. The convention is for an m4 definition UNROLL_THRESHOLD to set the crossover point, with sizes < UNROLL_THRESHOLD using the simple loop, sizes >= UNROLL_THRESHOLD using the unrolled loop. If position independent code adds a couple of cycles to an unrolled loop setup, the threshold will vary with PIC or non-PIC. Something like the following is typical. deflit(UNROLL_THRESHOLD, ifdef(`PIC',10,8)) There's no automated way to determine the threshold. Setting it to a small value and then to a big value makes it possible to measure the simple and unrolled loops each over a range of sizes, from which the crossover point can be determined. Alternately, just adjust the threshold up or down until there's no more speedups. UNROLLED LOOP CODING The x86 addressing modes allow a byte displacement of -128 to +127, making it possible to access 256 bytes, which is 64 limbs, without adjusting pointer registers within the loop. Dword sized displacements can be used too, but they increase code size, and unrolling to 64 ought to be enough. When unrolling to the full 64 limbs/loop, the limb at the top of the loop will have a displacement of -128, so pointers have to have a corresponding +128 added before entering the loop. When unrolling to 32 limbs/loop displacements 0 to 127 can be used with 0 at the top of the loop and no adjustment needed to the pointers. Where 64 limbs/loop is supported, the +128 adjustment is done only when 64 limbs/loop is selected. Usually the gain in speed using 64 instead of 32 or 16 is small, so support for 64 limbs/loop is generally only for comparison. COMPUTED JUMPS When working from least significant limb to most significant limb (most routines) the computed jump and pointer calculations in preparation for an unrolled loop are as follows. S = operand size in limbs N = number of limbs per loop (UNROLL_COUNT) L = log2 of unrolling (UNROLL_LOG2) M = mask for unrolling (UNROLL_MASK) C = code bytes per limb in the loop B = bytes per limb (4 for x86) computed jump (-S & M) * C + entrypoint subtract from pointers (-S & M) * B initial loop counter (S-1) >> L displacements 0 to B*(N-1) The loop counter is decremented at the end of each loop, and the looping stops when the decrement takes the counter to -1. The displacements are for the addressing accessing each limb, eg. a load with "movl disp(%ebx), %eax". Usually the multiply by "C" can be handled without an imul, using instead an leal, or a shift and subtract. When working from most significant to least significant limb (eg. mpn_lshift and mpn_copyd), the calculations change as follows. add to pointers (-S & M) * B displacements 0 to -B*(N-1) OLD GAS 1.92.3 This version comes with FreeBSD 2.2.8 and has a couple of gremlins that affect GMP code. Firstly, an expression involving two forward references to labels comes out as zero. For example, addl $bar-foo, %eax foo: nop bar: This should lead to "addl $1, %eax", but it comes out as "addl $0, %eax". When only one forward reference is involved, it works correctly, as for example, foo: addl $bar-foo, %eax nop bar: Secondly, an expression involving two labels can't be used as the displacement for an leal. For example, foo: nop bar: leal bar-foo(%eax,%ebx,8), %ecx A slightly cryptic error is given, "Unimplemented segment type 0 in parse_operand". When only one label is used it's ok, and the label can be a forward reference too, as for example, leal foo(%eax,%ebx,8), %ecx nop foo: These problems only affect PIC computed jump calculations. The workarounds are just to do an leal without a displacement and then an addl, and to make sure the code is placed so that there's at most one forward reference in the addl. REFERENCES "Intel Architecture Software Developer's Manual", volumes 1, 2a, 2b, 3a, 3b, 2006, order numbers 253665 through 253669. Available on-line, ftp://download.intel.com/design/Pentium4/manuals/25366518.pdf ftp://download.intel.com/design/Pentium4/manuals/25366618.pdf ftp://download.intel.com/design/Pentium4/manuals/25366718.pdf ftp://download.intel.com/design/Pentium4/manuals/25366818.pdf ftp://download.intel.com/design/Pentium4/manuals/25366918.pdf "System V Application Binary Interface", Unix System Laboratories Inc, 1992, published by Prentice Hall, ISBN 0-13-880410-9. And the "Intel386 Processor Supplement", AT&T, 1991, ISBN 0-13-877689-X. These have details of calling conventions and ELF shared library PIC coding. Versions of both available on-line, http://www.sco.com/developer/devspecs "Intel386 Family Binary Compatibility Specification 2", Intel Corporation, published by McGraw-Hill, 1991, ISBN 0-07-031219-2. (Same as the above 386 ABI supplement.) ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/x86/k6/README ======================================================================== Copyright 2000, 2001 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. AMD K6 MPN SUBROUTINES This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and K6-3. The mmx subdirectory has MMX code suiting plain K6, the k62mmx subdirectory has MMX code suiting K6-2 and K6-3. All chips in the K6 family have MMX, the separate directories are just so that ./configure can omit them if the assembler doesn't support MMX. STATUS Times for the loops, with all code and data in L1 cache, are as follows. cycles/limb mpn_add_n/sub_n 3.25 normal, 2.75 in-place mpn_mul_1 6.25 mpn_add/submul_1 7.65-8.4 (varying with data values) mpn_mul_basecase 9.25 cycles/crossproduct (approx) mpn_sqr_basecase 4.7 cycles/crossproduct (approx) or 9.2 cycles/triangleproduct (approx) mpn_l/rshift 3.0 mpn_divrem_1 20.0 mpn_mod_1 20.0 mpn_divexact_by3 11.0 mpn_copyi 1.0 mpn_copyd 1.0 K6-2 and K6-3 have dual-issue MMX and get the following improvements. mpn_l/rshift 1.75 Prefetching of sources hasn't yet given any joy. With the 3DNow "prefetch" instruction, code seems to run slower, and with just "mov" loads it doesn't seem faster. Results so far are inconsistent. The K6 does a hardware prefetch of the second cache line in a sector, so the penalty for not prefetching in software is reduced. NOTES All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow. Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can execute them in both X and Y (and in both together). Branch misprediction penalty is 1 to 4 cycles (Optimization Manual chapter 6 table 12). Write-allocate L1 data cache means prefetching of destinations is unnecessary. Store queue is 7 entries of 64 bits each. Floating point multiplications can be done in parallel with integer multiplications, but there doesn't seem to be any way to make use of this. OPTIMIZATIONS Unrolled loops are used to reduce looping overhead. The unrolling is configurable up to 32 limbs/loop for most routines, up to 64 for some. Sometimes computed jumps into the unrolling are used to handle sizes not a multiple of the unrolling. An attractive feature of this is that times smoothly increase with operand size, but an indirect jump is about 6 cycles and the setups about another 6, so it depends on how much the unrolled code is faster than a simple loop as to whether a computed jump ought to be used. Position independent code is implemented using a call to get eip for computed jumps and a ret is always done, rather than an addl $4,%esp or a popl, so the CPU return address branch prediction stack stays synchronised with the actual stack in memory. Such a call however still costs 4 to 7 cycles. Branch prediction, in absence of any history, will guess forward jumps are not taken and backward jumps are taken. Where possible it's arranged that the less likely or less important case is under a taken forward jump. MMX Putting emms or femms as late as possible in a routine seems to be fastest. Perhaps an emms or femms stalls until all outstanding MMX instructions have completed, so putting it later gives them a chance to complete on their own, in parallel with other operations (like register popping). The Optimization Manual chapter 5 recommends using a femms on K6-2 and K6-3 at the start of a routine, in case it's been preceded by x87 floating point operations. This isn't done because in gmp programs it's expected that x87 floating point won't be much used and that chances are an mpn routine won't have been preceded by any x87 code. CODING Instructions in general code are shown paired if they can decode and execute together, meaning two short decode instructions with the second not depending on the first, only the first using the shifter, no more than one load, and no more than one store. K6 does some out of order execution so the pairings aren't essential, they just show what slots might be available. When decoding is the limiting factor things can be scheduled that might not execute until later. NOTES Code alignment - if an opcode/modrm or 0Fh/opcode/modrm crosses a cache line boundary, short decode is inhibited. The cross.pl script detects this. - loops and branch targets should be aligned to 16 bytes, or ensure at least 2 instructions before a 32 byte boundary. This makes use of the 16 byte cache in the BTB. Addressing modes - (%esi) degrades decoding from short to vector. 0(%esi) doesn't have this problem, and can be used as an equivalent, or easier is just to use a different register, like %ebx. - K6 and pre-CXT core K6-2 have the following problem. (K6-2 CXT and K6-3 have it fixed, these being cpuid function 1 signatures 0x588 to 0x58F). If more than 3 bytes are needed to determine instruction length then decoding degrades from direct to long, or from long to vector. This happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since with mod=00 the sib determines whether there's a displacement. This affects all MMX and 3DNow instructions, and others with an 0F prefix, like movzbl. The modes affected are anything with an index and no displacement, or an index but no base, and this includes (%esp) which is really (,%esp,1). The cross.pl script detects problem cases. The workaround is to always use a displacement, and to do this with Zdisp if it's zero so the assembler doesn't discard it. See Optimization Manual rev D page 67 and 3DNow Porting Guide rev B pages 13-14 and 36-37. Calls - indirect jumps and calls are not branch predicted, they measure about 6 cycles. Various - adcl 2 cycles of decode, maybe 2 cycles executing in the X pipe - bsf 12-27 cycles - emms 5 cycles - femms 3 cycles - jecxz 2 cycles taken, 13 not taken (optimization manual says 7 not taken) - divl 20 cycles back-to-back - imull 2 decode, 3 execute - mull 2 decode, 3 execute (optimization manual decoding sample) - prefetch 2 cycles - rcll/rcrl implicit by one bit: 2 cycles immediate or %cl count: 11 + 2 per bit for dword 13 + 4 per bit for byte - setCC 2 cycles - xchgl %eax,reg 1.5 cycles, back-to-back (strange) reg,reg 2 cycles, back-to-back REFERENCES "AMD-K6 Processor Code Optimization Application Note", AMD publication number 21924, revision D amendment 0, January 2000. This describes K6-2 and K6-3. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/21924.pdf "AMD-K6 MMX Enhanced Processor x86 Code Optimization Application Note", AMD publication number 21828, revision A amendment 0, August 1997. This is an older edition of the above document, describing plain K6. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/21828.pdf "3DNow Technology Manual", AMD publication number 21928G/0-March 2000. This describes the femms and prefetch instructions, but nothing else from 3DNow has been used. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/21928.pdf "3DNow Instruction Porting Guide", AMD publication number 22621, revision B, August 1999. This has some notes on general K6 optimizations as well as 3DNow. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22621.pdf ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/x86/k7/README ======================================================================== Copyright 2000, 2001 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. AMD K7 MPN SUBROUTINES This directory contains code optimized for the AMD Athlon CPU. The mmx subdirectory has routines using MMX instructions. All Athlons have MMX, the separate directory is just so that configure can omit it if the assembler doesn't support MMX. STATUS Times for the loops, with all code and data in L1 cache. cycles/limb mpn_add/sub_n 1.6 mpn_copyi 0.75 or 1.0 \ varying with data alignment mpn_copyd 0.75 or 1.0 / mpn_divrem_1 17.0 integer part, 15.0 fractional part mpn_mod_1 17.0 mpn_divexact_by3 8.0 mpn_l/rshift 1.2 mpn_mul_1 3.4 mpn_addmul/submul_1 3.9 mpn_mul_basecase 4.42 cycles/crossproduct (approx) mpn_sqr_basecase 2.3 cycles/crossproduct (approx) or 4.55 cycles/triangleproduct (approx) Prefetching of sources hasn't yet been tried. NOTES cmov, MMX, 3DNow and some extensions to MMX and 3DNow are available. Write-allocate L1 data cache means prefetching of destinations is unnecessary. Floating point multiplications can be done in parallel with integer multiplications, but there doesn't seem to be any way to make use of this. Unsigned "mul"s can be issued every 3 cycles. This suggests 3 is a limit on the speed of the multiplication routines. The documentation shows mul executing in IEU0 (or maybe in IEU0 and IEU1 together), so it might be that, to get near 3 cycles code has to be arranged so that nothing else is issued to IEU0. A busy IEU0 could explain why some code takes 4 cycles and other apparently equivalent code takes 5. OPTIMIZATIONS Unrolled loops are used to reduce looping overhead. The unrolling is configurable up to 32 limbs/loop for most routines and up to 64 for some. The K7 has 64k L1 code cache so quite big unrolling is allowable. Computed jumps into the unrolling are used to handle sizes not a multiple of the unrolling. An attractive feature of this is that times increase smoothly with operand size, but it may be that some routines should just have simple loops to finish up, especially when PIC adds between 2 and 16 cycles to get %eip. Position independent code is implemented using a call to get %eip for the computed jumps and a ret is always done, rather than an addl $4,%esp or a popl, so the CPU return address branch prediction stack stays synchronised with the actual stack in memory. Branch prediction, in absence of any history, will guess forward jumps are not taken and backward jumps are taken. Where possible it's arranged that the less likely or less important case is under a taken forward jump. CODING Instructions in general code have been shown grouped if they can execute together, which means up to three direct-path instructions which have no successive dependencies. K7 always decodes three and has out-of-order execution, but the groupings show what slots might be available and what dependency chains exist. When there's vector-path instructions an effort is made to get triplets of direct-path instructions in between them, even if there's dependencies, since this maximizes decoding throughput and might save a cycle or two if decoding is the limiting factor. INSTRUCTIONS adcl direct divl 39 cycles back-to-back lodsl,etc vector loop 1 cycle vector (decl/jnz opens up one decode slot) movd reg vector movd mem direct mull issue every 3 cycles, latency 4 cycles low word, 6 cycles high word popl vector (use movl for more than one pop) pushl direct, will pair with a load shrdl %cl vector, 3 cycles, seems to be 3 decode too xorl r,r false read dependency recognised REFERENCES "AMD Athlon Processor X86 Code Optimization Guide", AMD publication number 22007, revision K, February 2002. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf "3DNow Technology Manual", AMD publication number 21928G/0-March 2000. This describes the femms and prefetch instructions. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/21928.pdf "AMD Extensions to the 3DNow and MMX Instruction Sets Manual", AMD publication number 22466, revision D, March 2000. This describes instructions added in the Athlon processor, such as pswapd and the extra prefetch forms. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22466.pdf "3DNow Instruction Porting Guide", AMD publication number 22621, revision B, August 1999. This has some notes on general Athlon optimizations as well as 3DNow. Available on-line, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22621.pdf ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/x86/p6/README ======================================================================== Copyright 2000, 2001 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. INTEL P6 MPN SUBROUTINES This directory contains code optimized for Intel P6 class CPUs, meaning PentiumPro, Pentium II and Pentium III. The mmx and p3mmx subdirectories have routines using MMX instructions. STATUS Times for the loops, with all code and data in L1 cache, are as follows. Some of these might be able to be improved. cycles/limb mpn_add_n/sub_n 3.7 mpn_copyi 0.75 mpn_copyd 1.75 (or 0.75 if no overlap) mpn_divrem_1 39.0 mpn_mod_1 21.5 mpn_divexact_by3 8.5 mpn_mul_1 5.5 mpn_addmul/submul_1 6.35 mpn_l/rshift 2.5 mpn_mul_basecase 8.2 cycles/crossproduct (approx) mpn_sqr_basecase 4.0 cycles/crossproduct (approx) or 7.75 cycles/triangleproduct (approx) Pentium II and III have MMX and get the following improvements. mpn_divrem_1 25.0 integer part, 17.5 fractional part mpn_l/rshift 1.75 NOTES Write-allocate L1 data cache means prefetching of destinations is unnecessary. Mispredicted branches have a penalty of between 9 and 15 cycles, and even up to 26 cycles depending how far speculative execution has gone. The 9 cycle minimum penalty comes from the issue pipeline being 9 stages. A copy with rep movs seems to copy 16 bytes at a time, since speeds for 4, 5, 6 or 7 limb operations are all the same. The 0.75 cycles/limb would be 3 cycles per 16 byte block. CODING Instructions in general code have been shown grouped if they can execute together, which means up to three instructions with no successive dependencies, and with only the first being a multiple micro-op. P6 has out-of-order execution, so the groupings are really only showing dependent paths where some shuffling might allow some latencies to be hidden. REFERENCES "Intel Architecture Optimization Reference Manual", 1999, revision 001 dated 02/99, order number 245127 (order number 730795-001 is in the document too). Available on-line: http://download.intel.com/design/PentiumII/manuals/245127.htm "Intel Architecture Optimization Manual", 1997, order number 242816. This is an older document mostly about P5 and not as good as the above. Available on-line: http://download.intel.com/design/PentiumII/manuals/242816.htm ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/x86/pentium/README ======================================================================== Copyright 1996, 1999-2001, 2003 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. INTEL PENTIUM P5 MPN SUBROUTINES This directory contains mpn functions optimized for Intel Pentium (P5,P54) processors. The mmx subdirectory has additional code for Pentium with MMX (P55). STATUS cycles/limb mpn_add_n/sub_n 2.375 mpn_mul_1 12.0 mpn_add/submul_1 14.0 mpn_mul_basecase 14.2 cycles/crossproduct (approx) mpn_sqr_basecase 8 cycles/crossproduct (approx) or 15.5 cycles/triangleproduct (approx) mpn_l/rshift 5.375 normal (6.0 on P54) 1.875 special shift by 1 bit mpn_divrem_1 44.0 mpn_mod_1 28.0 mpn_divexact_by3 15.0 mpn_copyi/copyd 1.0 Pentium MMX gets the following improvements mpn_l/rshift 1.75 mpn_mul_1 12.0 normal, 7.0 for 16-bit multiplier mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb. Due to loop overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they should. Intel documentation says a mul instruction is 10 cycles, but it measures 9 and the routines using it run as 9. P55 MMX AND X87 The cost of switching between MMX and x87 floating point on P55 is about 100 cycles (fld1/por/emms for instance). In order to avoid that the two aren't mixed and currently that means using MMX and not x87. MMX offers a big speedup for lshift and rshift, and a nice speedup for 16-bit multipliers in mpn_mul_1. If fast code using x87 is found then perhaps the preference for MMX will be reversed. P54 SHLDL mpn_lshift and mpn_rshift run at about 6 cycles/limb on P5 and P54, but the documentation indicates that they should take only 43/8 = 5.375 cycles/limb, or 5 cycles/limb asymptotically. The P55 runs them at the expected speed. It seems that on P54 a shldl or shrdl allows pairing in one following cycle, but not two. For example, back to back repetitions of the following shldl( %cl, %eax, %ebx) xorl %edx, %edx xorl %esi, %esi run at 5 cycles, as expected, but repetitions of the following run at 7 cycles, whereas 6 would be expected (and is achieved on P55), shldl( %cl, %eax, %ebx) xorl %edx, %edx xorl %esi, %esi xorl %edi, %edi xorl %ebp, %ebp Three xorls run at 7 cycles too, so it doesn't seem to be just that pairing inhibited is only in the second following cycle (or something like that). Avoiding this problem would bring P54 shifts down from 6.0 c/l to 5.5 with a pattern of shift, 2 loads, shift, 2 stores, shift, etc. A start has been made on something like that, but it's not yet complete. OTHER NOTES Prefetching Destinations Pentium doesn't allocate cache lines on writes, unlike most other modern processors. Since the functions in the mpn class do array writes, we have to handle allocating the destination cache lines by reading a word from it in the loops, to achieve the best performance. Prefetching Sources Prefetching of sources is pointless since there's no out-of-order loads. Any load instruction blocks until the line is brought to L1, so it may as well be the load that wants the data which blocks. Data Cache Bank Clashes Pairing of memory operations requires that the two issued operations refer to different cache banks (ie. different addresses modulo 32 bytes). The simplest way to ensure this is to read/write two words from the same object. If we make operations on different objects, they might or might not be to the same cache bank. PIC %eip Fetching A simple call $+5 and popl can be used to get %eip, there's no need to balance calls and returns since P5 doesn't have any return stack branch prediction. Float Multiplies fmul is pairable and can be issued every 2 cycles (with a 4 cycle latency for data ready to use). This is a lot better than integer mull or imull at 9 cycles non-pairing. Unfortunately the advantage is quickly eaten away by needing to throw data through memory back to the integer registers to adjust for fild and fist being signed, and to do things like propagating carry bits. REFERENCES "Intel Architecture Optimization Manual", 1997, order number 242816. This is mostly about P5, the parts about P6 aren't relevant. Available on-line: http://download.intel.com/design/PentiumII/manuals/242816.htm ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/x86/pentium4/README ======================================================================== Copyright 2001 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. INTEL PENTIUM-4 MPN SUBROUTINES This directory contains mpn functions optimized for Intel Pentium-4. The mmx subdirectory has routines using MMX instructions, the sse2 subdirectory has routines using SSE2 instructions. All P4s have these, the separate directories are just so configure can omit that code if the assembler doesn't support it. STATUS cycles/limb mpn_add_n/sub_n 4 normal, 6 in-place mpn_mul_1 4 normal, 6 in-place mpn_addmul_1 6 mpn_submul_1 7 mpn_mul_basecase 6 cycles/crossproduct (approx) mpn_sqr_basecase 3.5 cycles/crossproduct (approx) or 7.0 cycles/triangleproduct (approx) mpn_l/rshift 1.75 The shifts ought to be able to go at 1.5 c/l, but not much effort has been applied to them yet. In-place operations, and all addmul, submul, mul_basecase and sqr_basecase calls, suffer from pipeline anomalies associated with write combining and movd reads and writes to the same or nearby locations. The movq instructions do not trigger the same hardware problems. Unfortunately, using movq and splitting/combining seems to require too many extra instructions to help. Perhaps future chip steppings will be better. NOTES The Pentium-4 pipeline "Netburst", provides for quite a number of surprises. Many traditional x86 instructions run very slowly, requiring use of alterative instructions for acceptable performance. adcl and sbbl are quite slow at 8 cycles for reg->reg. paddq of 32-bits within a 64-bit mmx register seems better, though the combination paddq/psrlq when propagating a carry is still a 4 cycle latency. incl and decl should be avoided, instead use add $1 and sub $1. Apparently the carry flag is not separately renamed, so incl and decl depend on all previous flags-setting instructions. shll and shrl have a 4 cycle latency, or 8 times the latency of the fastest integer instructions (addl, subl, orl, andl, and some more). shldl and shrdl seem to have 13 and 15 cycles latency, respectively. Bizarre. movq mmx -> mmx does have 6 cycle latency, as noted in the documentation. pxor/por or similar combination at 2 cycles latency can be used instead. The movq however executes in the float unit, thereby saving MMX execution resources. With the right juggling, data moves shouldn't be on a dependent chain. L1 is write-through, but the write-combining sounds like it does enough to not require explicit destination prefetching. xmm registers so far haven't found a use, but not much effort has been expended. A configure test for whether the operating system knows fxsave/fxrestor will be needed if they're used. REFERENCES Intel Pentium-4 processor manuals, http://developer.intel.com/design/pentium4/manuals "Intel Pentium 4 Processor Optimization Reference Manual", Intel, 2001, order number 248966. Available on-line: http://developer.intel.com/design/pentium4/manuals/248966.htm ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * mpn/x86_64/README ======================================================================== Copyright 2003, 2004, 2006, 2008 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. AMD64 MPN SUBROUTINES This directory contains mpn functions for AMD64 chips. It is also useful for 64-bit Pentiums, and "Core 2". RELEVANT OPTIMIZATION ISSUES The Opteron and Athlon64 can sustain up to 3 instructions per cycle, but in practice that is only possible for integer instructions. But almost any three integer instructions can issue simultaneously, including any 3 ALU operations, including shifts. Up to two memory operations can issue each cycle. Scheduling typically requires that load-use instructions are split into separate load and use instructions. That requires more decode resources, and it is rarely a win. Opteron/Athlon64 have deep out-of-order core. Optimizing for 64-bit Pentium4 is probably a waste of time, as the most critical instructions are very poorly implemented here. Perhaps we could save a cycle or two, but the most common loops now run at between 10 and 22 cycles, so a saved cycle isn't too exciting. The new spin of the venerable P6 core, the "Core 2" is much better than the Pentium4 for the GMP loops. Its integer pipeline is somewhat similar to to the Opteron/Athlon64 pipeline, except that the GMP favourites ADC/SBB and MUL are slower. Furthermore, an INC/DEC followed by ADC/SBB incur a pipeline stall of around 10 cycles. The default mpn_add_n and mpn_sub_n code suffers badly from the stall. The code in the core2 subdirectory uses the almost forgotten instruction JRCXZ for loop control, and updates the induction variable using LEA. REFERENCES "System V Application Binary Interface AMD64 Architecture Processor Supplement", draft version 0.99, December 2007. http://www.x86-64.org/documentation/abi.pdf ======================================================================== * tune/README ======================================================================== Copyright 2000-2002, 2004 Free Software Foundation, Inc. This file is part of the GNU MP Library. The GNU MP Library is free software; you can redistribute it and/or modify it under the terms of either: * the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. or * the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. or both in parallel, as here. The GNU MP Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received copies of the GNU General Public License and the GNU Lesser General Public License along with the GNU MP Library. If not, see https://www.gnu.org/licenses/. GMP SPEED MEASURING AND PARAMETER TUNING The programs in this directory are for knowledgeable users who want to measure GMP routines on their machine, and perhaps tweak some settings or identify things that can be improved. The programs here are tools, not ready to run solutions. Nothing is built in a normal "make all", but various Makefile targets described below exist. Relatively few systems and CPUs have been tested, so be sure to verify that results are sensible before relying on them. MISCELLANEOUS NOTES --enable-assert Don't configure with --enable-assert, since the extra code added by assertion checking may influence measurements. Direct mapped caches Some effort has been made to accommodate CPUs with direct mapped caches, by putting data blocks more or less contiguously on the stack. But this will depend on TMP_ALLOC using alloca, and even then it may or may not be enough. FreeBSD 4.2 i486 getrusage This getrusage seems to be a bit doubtful, it looks like it's microsecond accurate, but sometimes ru_utime remains unchanged after a time of many microseconds has elapsed. It'd be good to detect this in the time.c initializations, but for now the suggestion is to pretend it doesn't exist. ./configure ac_cv_func_getrusage=no NetBSD 1.4.1 m68k macintosh time base On this system it's been found getrusage often goes backwards, making it unusable (time.c getrusage_backwards_p detects this). gettimeofday sometimes doesn't update atomically when it crosses a 1 second boundary. Not sure what to do about this. Expect possible intermittent failures. SCO OpenUNIX 8 /etc/hw /etc/hw takes about a second to return the cpu frequency, which suggests perhaps it's measuring each time it runs. If this is annoying when running the speed program repeatedly then set a GMP_CPU_FREQUENCY environment variable (see TIME BASE section below). Timing on GNU/Linux On Linux, timing currently uses the cycle counter. This is unreliable, since the counter is not saved and restored at context switches (unlike FreeBSD and Solaris where the cycle counter is "virtualized"). Using the clock_gettime method with CLOCK_PROCESS_CPUTIME_ID (posix) or CLOCK_VIRTUAL (BSD) should be more reliable. To get clock_gettime with glibc, one has to link with -lrt (which also drags in the pthreads threading library). configure.in must be hacked to detect this and arrange proper linking. Something like old_LIBS="$LIBS" AC_SEARCH_LIBS(clock_gettime, rt, [AC_DEFINE(HAVE_CLOCK_GETTIME)]) TUNE_LIBS="$LIBS" LIBS="$old_LIBS" AC_SUBST(TUNE_LIBS) might work. Low resolution timebase Parameter tuning can be very time consuming if the only timebase available is a 10 millisecond clock tick, to the point of being unusable. This is currently the case on VAX and ARM systems. PARAMETER TUNING The "tuneup" program runs some tests designed to find the best settings for various thresholds, like MUL_TOOM22_THRESHOLD. Its output can be put into gmp-mparam.h. The program is built and run with make tune If the thresholds indicated are grossly different from the values in the selected gmp-mparam.h then there may be a performance boost in applicable size ranges by changing gmp-mparam.h accordingly. Be sure to do a full reconfigure and rebuild to get any newly set thresholds to take effect. A partial rebuild is enough sometimes, but a fresh configure and make is certain to be correct. If a CPU has specific tuned parameters coming from a gmp-mparam.h in one of the mpn subdirectories then the values from "make tune" should be similar. But check that the configured CPU is right and there are no machine specific effects causing a difference. It's hoped the compiler and options used won't have too much effect on thresholds, since for most CPUs they ultimately come down to comparisons between assembler subroutines. Missing out on the longlong.h macros by not using gcc will probably have an effect. Some thresholds produced by the tune program are merely single values chosen from what's a range of sizes where two algorithms are pretty much the same speed. When this happens the program is likely to give somewhat different values on successive runs. This is noticeable on the toom3 thresholds for instance. SPEED PROGRAM The "speed" program can be used for measuring and comparing various routines, and producing tables of data or gnuplot graphs. Compile it with make speed (Or on DOS systems "make speed.exe".) Here are some examples of how to use it. Check the code for all the options. Draw a graph of mpn_mul_n, stepping through sizes by 10 or a factor of 1.05 (whichever is greater). ./speed -s 10-5000 -t 10 -f 1.05 -P foo mpn_mul_n gnuplot foo.gnuplot Compare mpn_add_n and an mpn_lshift by 1, showing times in cycles and showing under mpn_lshift the difference between it and mpn_add_n. ./speed -s 1-40 -c -d mpn_add_n mpn_lshift.1 Using option -c for times in cycles is interesting but normally only necessary when looking carefully at assembler subroutines. You might think it would always give an integer value, but this doesn't happen in practice, probably due to overheads in the time measurements. In the free-form output the "#" symbol against a measurement means the corresponding routine is fastest at that size. This is a convenient visual cue when comparing different routines. The graph data files .data don't get this since it would upset gnuplot or other data viewers. TIME BASE The time measuring method is determined in time.c, based on what the configured host has available. A cycle counter is preferred, possibly supplemented by another method if the counter has a limited range. A microsecond accurate getrusage() or gettimeofday() will work quite well too. The cycle counters (except possibly on alpha) and gettimeofday() will depend on the machine being otherwise idle, or rather on other jobs not stealing CPU time from the measuring program. Short routines (those that complete within a timeslice) should work even on a busy machine. Some trouble is taken by speed_measure() in common.c to avoid ill effects from sporadic interrupts, or other intermittent things (like cron waking up every minute). But generally an idle machine will be necessary to be certain of consistent results. The CPU frequency is needed to convert between cycles and seconds, or for when a cycle counter is supplemented by getrusage() etc. The speed program will convert as necessary according to the output format requested. The tune program will work with either cycles or seconds. freq.c knows how to get the frequency on some systems, or can measure a cycle counter against gettimeofday() or getrusage(), but when that fails, or needs to be overridden, an environment variable GMP_CPU_FREQUENCY can be used (in Hertz). For example in "bash" on a 650 MHz machine, export GMP_CPU_FREQUENCY=650e6 A high precision time base makes it possible to get accurate measurements in a shorter time. EXAMPLE COMPARISONS - VARIOUS Here are some ideas for things that can be done with the speed program. There's always going to be a certain amount of overhead in the time measurements, due to reading the time base, and in the loop that runs a routine enough times to get a reading of the desired precision. Noop functions taking various arguments are available to measure this. The "overhead" printed by the speed program each time in its intro is the "noop" routine, but note that this is just for information, it isn't deducted from the times printed or anything. ./speed -s 1 noop noop_wxs noop_wxys To see how many cycles per limb a routine is taking, look at the time increase when the size increments, using option -D. This avoids fixed overheads in the measuring. Also, remember many of the assembler routines have unrolled loops, so it might be necessary to compare times at, say, 16, 32, 48, 64 etc to see what the unrolled part is taking, as opposed to any finishing off. ./speed -s 16-64 -t 16 -C -D mpn_add_n The -C option on its own gives cycles per limb, but is really only useful at big sizes where fixed overheads are small compared to the code doing the real work. Remember of course memory caching and/or page swapping will affect results at large sizes. ./speed -s 500000 -C mpn_add_n Once a calculation stops fitting in the CPU data cache, it's going to start taking longer. Exactly where this happens depends on the cache priming in the measuring routines, and on what sort of "least recently used" the hardware does. Here's an example for a CPU with a 16kbyte L1 data cache and 32-bit limb, showing a suddenly steeper curve for mpn_add_n at about 2000 limbs. ./speed -s 1-4000 -t 5 -f 1.02 -P foo mpn_add_n gnuplot foo.gnuplot When a routine has an unrolled loop for, say, multiples of 8 limbs and then an ordinary loop for the remainder, it can happen that it's actually faster to do an operation on, say, 8 limbs than it is on 7 limbs. The following draws a graph of mpn_sub_n, to see whether times smoothly increase with size. ./speed -s 1-100 -c -P foo mpn_sub_n gnuplot foo.gnuplot If mpn_lshift and mpn_rshift have special case code for shifts by 1, it ought to be faster (or at least not slower) than shifting by, say, 2 bits. ./speed -s 1-200 -c mpn_rshift.1 mpn_rshift.2 An mpn_lshift by 1 can be done by mpn_add_n adding a number to itself, and if the lshift isn't faster there's an obvious improvement that's possible. ./speed -s 1-200 -c mpn_lshift.1 mpn_add_n_self On some CPUs (AMD K6 for example) an "in-place" mpn_add_n where the destination is one of the sources is faster than a separate destination. Here's an example to see this. ".1" selects dst==src1 for mpn_add_n (and mpn_sub_n), for other values see speed.h SPEED_ROUTINE_MPN_BINARY_N_CALL. ./speed -s 1-200 -c mpn_add_n mpn_add_n.1 The gmp manual points out that divisions by powers of two should be done using a right shift because it'll be significantly faster than an actual division. The following shows by what factor mpn_rshift is faster than mpn_divrem_1, using division by 32 as an example. ./speed -s 10-20 -r mpn_rshift.5 mpn_divrem_1.32 EXAMPLE COMPARISONS - MULTIPLICATION mul_basecase takes a "." parameter. If positive, it gives the second (smaller) operand size. For example to show speeds for 3x3 up to 20x3 in cycles, ./speed -s 3-20 -c mpn_mul_basecase.3 A negative ".<-r>" parameter fixes the size of the product to the absolute value r. For example to show speeds for 10x10 up to 19x1 in cycles, ./speed -s 10-19 -c mpn_mul_basecase.-20 mul_basecase with no parameter does an NxN multiply, so for example to show speeds in cycles for 1x1, 2x2, 3x3, etc, up to 20x20, in cycles, ./speed -s 1-20 -c mpn_mul_basecase sqr_basecase is implemented by a "triangular" method on most CPUs, making it up to twice as fast as mul_basecase. In practice loop overheads and the products on the diagonal mean it falls short of this. Here's an example running the two and showing by what factor an NxN mul_basecase is slower than an NxN sqr_basecase. (Some versions of sqr_basecase only allow sizes below SQR_TOOM2_THRESHOLD, so if it crashes at that point don't worry.) ./speed -s 1-20 -r mpn_sqr_basecase mpn_mul_basecase The technique described above with -CD for showing the time difference in cycles per limb between two size operations can be done on an NxN mul_basecase using -E to change the basis for the size increment to N*N. For instance a 20x20 operation is taken to be doing 400 limbs, and a 16x16 doing 256 limbs. The following therefore shows the per crossproduct speed of mul_basecase and sqr_basecase at around 20x20 limbs. ./speed -s 16-20 -t 4 -CDE mpn_mul_basecase mpn_sqr_basecase Of course sqr_basecase isn't really doing NxN crossproducts, but it can be interesting to compare it to mul_basecase as if it was. For sqr_basecase the -F option can be used to base the deltas on N*(N+1)/2 operations, which is the triangular products sqr_basecase does. For example, ./speed -s 16-20 -t 4 -CDF mpn_sqr_basecase Both -E and -F are preliminary and might change. A consistent approach to using them when claiming certain per crossproduct or per triangularproduct speeds hasn't really been established, but the increment between speeds in the range karatsuba will call seems sensible, that being k to k/2. For instance, if the karatsuba threshold was 20 for the multiply and 30 for the square, ./speed -s 10-20 -t 10 -CDE mpn_mul_basecase ./speed -s 15-30 -t 15 -CDF mpn_sqr_basecase EXAMPLE COMPARISONS - MALLOC The gmp manual recommends application programs avoid excessive initializing and clearing of mpz_t variables (and mpq_t and mpf_t too). Every new variable will at a minimum go through an init, a realloc for its first store, and finally a clear. Quite how long that takes depends on the C library. The following compares an mpz_init/realloc/clear to a 10 limb mpz_add. Don't be surprised if the mallocing is quite slow. ./speed -s 10 -c mpz_init_realloc_clear mpz_add On some systems malloc and free are much slower when dynamic linked. The speed-dynamic program can be used to see this. For example the following measures malloc/free, first static then dynamic. ./speed -s 10 -c malloc_free ./speed-dynamic -s 10 -c malloc_free Of course a real world program has big problems if it's doing so many mallocs and frees that it gets slowed down by a dynamic linked malloc. EXAMPLE COMPARISONS - STRING CONVERSIONS mpn_get_str does a binary to string conversion. The base is specified with a "." parameter, or decimal by default. Power of 2 bases are much faster than general bases. The following compares decimal and hex for instance. ./speed -s 1-20 -c mpn_get_str mpn_get_str.16 Smaller bases need more divisions to split a given size number, and so are slower. The following compares base 3 and base 9. On small operands 9 will be nearly twice as fast, though at bigger sizes this reduces since in the current implementation both divide repeatedly by 3^20 (or 3^40 for 64 bit limbs) and those divisions come to dominate. ./speed -s 1-20 -cr mpn_get_str.3 mpn_get_str.9 mpn_set_str does a string to binary conversion. The base is specified with a "." parameter, or decimal by default. Power of 2 bases are faster than general bases on large conversions. ./speed -s 1-512 -f 2 -c mpn_set_str.8 mpn_set_str.10 mpn_set_str also has some special case code for decimal which is a bit faster than the general case, basically by giving the compiler a chance to optimize some multiplications by 10. ./speed -s 20-40 -c mpn_set_str.9 mpn_set_str.10 mpn_set_str.11 EXAMPLE COMPARISONS - GCDs mpn_gcd_1 has a threshold for when to reduce using an initial x%y when both x and y are single limbs. This isn't tuned currently, but a value can be established by a measurement like ./speed -s 10-32 mpn_gcd_1.10 This runs src[0] from 10 to 32 bits, and y fixed at 10 bits. If the div threshold is high, say 31 so it's effectively disabled then a 32x10 bit gcd is done by nibbling away at the 32-bit operands bit-by-bit. When the threshold is small, say 1 bit, then an initial x%y is done to reduce it to a 10x10 bit operation. The threshold in mpn/generic/gcd_1.c or the various assembler implementations can be tweaked up or down until there's no more speedups on interesting combinations of sizes. Note that this affects only a 1x1 limb operation and so isn't very important. (An Nx1 limb operation always does an initial modular reduction, using mpn_mod_1 or mpn_modexact_1_odd.) SPEED PROGRAM EXTENSIONS Potentially lots of things could be made available in the program, but it's been left at only the things that have actually been wanted and are likely to be reasonably useful in the future. Extensions should be fairly easy to make though. speed-ext.c is an example, in a style that should suit one-off tests, or new code fragments under development. many.pl is a script for generating a new speed program supplemented with alternate versions of the standard routines. It can be used for measuring experimental code, or for comparing different implementations that exist within a CPU family. THRESHOLD EXAMINING The speed program can be used to examine the speeds of different algorithms to check the tune program has done the right thing. For example to examine the karatsuba multiply threshold, ./speed -s 5-40 mpn_mul_basecase mpn_kara_mul_n When examining the toom3 threshold, remember it depends on the karatsuba threshold, so the right karatsuba threshold needs to be compiled into the library first. The tune program uses specially recompiled versions of mpn/mul_n.c etc for this reason, but the speed program simply uses the normal libgmp.la. Note further that the various routines may recurse into themselves on sizes far enough above applicable thresholds. For example, mpn_kara_mul_n will recurse into itself on sizes greater than twice the compiled-in MUL_TOOM22_THRESHOLD. When doing the above comparison between mul_basecase and kara_mul_n what's probably of interest is mul_basecase versus a kara_mul_n that does one level of Karatsuba then calls to mul_basecase, but this only happens on sizes less than twice the compiled MUL_TOOM22_THRESHOLD. A larger value for that setting can be compiled-in to avoid the problem if necessary. The same applies to toom3 and DC, though in a trickier fashion. There are some upper limits on some of the thresholds, arising from arrays dimensioned according to a threshold (mpn_mul_n), or asm code with certain sized displacements (some x86 versions of sqr_basecase). So putting huge values for the thresholds, even just for testing, may fail. FUTURE Make a program to check the time base is working properly, for small and large measurements. Make it able to test each available method, including perhaps the apparent resolution of each. Make a general mechanism for specifying operand overlap, and a syntax like maybe "mpn_add_n.dst=src2" to select it. Some measuring routines do this sort of thing with the "r" parameter currently. ---------------- Local variables: mode: text fill-column: 76 End: ======================================================================== * COPYING ======================================================================== GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . ======================================================================== * COPYING.LESSERv3 ======================================================================== GNU LESSER GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. This version of the GNU Lesser General Public License incorporates the terms and conditions of version 3 of the GNU General Public License, supplemented by the additional permissions listed below. 0. Additional Definitions. As used herein, "this License" refers to version 3 of the GNU Lesser General Public License, and the "GNU GPL" refers to version 3 of the GNU General Public License. "The Library" refers to a covered work governed by this License, other than an Application or a Combined Work as defined below. An "Application" is any work that makes use of an interface provided by the Library, but which is not otherwise based on the Library. Defining a subclass of a class defined by the Library is deemed a mode of using an interface provided by the Library. A "Combined Work" is a work produced by combining or linking an Application with the Library. The particular version of the Library with which the Combined Work was made is also called the "Linked Version". The "Minimal Corresponding Source" for a Combined Work means the Corresponding Source for the Combined Work, excluding any source code for portions of the Combined Work that, considered in isolation, are based on the Application, and not on the Linked Version. The "Corresponding Application Code" for a Combined Work means the object code and/or source code for the Application, including any data and utility programs needed for reproducing the Combined Work from the Application, but excluding the System Libraries of the Combined Work. 1. Exception to Section 3 of the GNU GPL. You may convey a covered work under sections 3 and 4 of this License without being bound by section 3 of the GNU GPL. 2. Conveying Modified Versions. If you modify a copy of the Library, and, in your modifications, a facility refers to a function or data to be supplied by an Application that uses the facility (other than as an argument passed when the facility is invoked), then you may convey a copy of the modified version: a) under this License, provided that you make a good faith effort to ensure that, in the event an Application does not supply the function or data, the facility still operates, and performs whatever part of its purpose remains meaningful, or b) under the GNU GPL, with none of the additional permissions of this License applicable to that copy. 3. Object Code Incorporating Material from Library Header Files. The object code form of an Application may incorporate material from a header file that is part of the Library. You may convey such object code under terms of your choice, provided that, if the incorporated material is not limited to numerical parameters, data structure layouts and accessors, or small macros, inline functions and templates (ten or fewer lines in length), you do both of the following: a) Give prominent notice with each copy of the object code that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the object code with a copy of the GNU GPL and this license document. 4. Combined Works. You may convey a Combined Work under terms of your choice that, taken together, effectively do not restrict modification of the portions of the Library contained in the Combined Work and reverse engineering for debugging such modifications, if you also do each of the following: a) Give prominent notice with each copy of the Combined Work that the Library is used in it and that the Library and its use are covered by this License. b) Accompany the Combined Work with a copy of the GNU GPL and this license document. c) For a Combined Work that displays copyright notices during execution, include the copyright notice for the Library among these notices, as well as a reference directing the user to the copies of the GNU GPL and this license document. d) Do one of the following: 0) Convey the Minimal Corresponding Source under the terms of this License, and the Corresponding Application Code in a form suitable for, and under terms that permit, the user to recombine or relink the Application with a modified version of the Linked Version to produce a modified Combined Work, in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source. 1) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (a) uses at run time a copy of the Library already present on the user's computer system, and (b) will operate properly with a modified version of the Library that is interface-compatible with the Linked Version. e) Provide Installation Information, but only if you would otherwise be required to provide such information under section 6 of the GNU GPL, and only to the extent that such information is necessary to install and execute a modified version of the Combined Work produced by recombining or relinking the Application with a modified version of the Linked Version. (If you use option 4d0, the Installation Information must accompany the Minimal Corresponding Source and Corresponding Application Code. If you use option 4d1, you must provide the Installation Information in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.) 5. Combined Libraries. You may place library facilities that are a work based on the Library side by side in a single library together with other library facilities that are not Applications and are not covered by this License, and convey such a combined library under terms of your choice, if you do both of the following: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities, conveyed under the terms of this License. b) Give prominent notice with the combined library that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 6. Revised Versions of the GNU Lesser General Public License. The Free Software Foundation may publish revised and/or new versions of the GNU Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library as you received it specifies that a certain numbered version of the GNU Lesser General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that published version or of any later version published by the Free Software Foundation. If the Library as you received it does not specify a version number of the GNU Lesser General Public License, you may choose any version of the GNU Lesser General Public License ever published by the Free Software Foundation. If the Library as you received it specifies that a proxy can decide whether future versions of the GNU Lesser General Public License shall apply, that proxy's public statement of acceptance of any version is permanent authorization for you to choose that version for the Library. ======================================================================== * COPYINGv2 ======================================================================== GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ======================================================================== * COPYINGv3 ======================================================================== GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read .