Apache Software Foundation > Apache POI
 

Developing Formula Evaluation

Introduction

This document is for developers wishing to contribute to the FormulaEvaluator API functionality.

When evaluating workbooks you may encounter an org.apache.poi.ss.formula.eval.NotImplementedException which indicates that a function is not (yet) supported by POI. Is there a workaround? Yes, the POI framework makes it easy to add implementation of new functions. Prior to POI-3.8 you had to checkout the source code from svn and make a custom build with your function implementation. Since POI-3.8 you can register new functions in run-time.

Currently, contribution is desired for implementing the standard MS Excel functions. Placeholder classes for these have been created, contributors only need to insert implementation for the individual evaluate() methods that do the actual evaluation.

Overview of FormulaEvaluator

Briefly, a formula string (along with the sheet and workbook that form the context in which the formula is evaluated) is first parsed into Reverse Polish Notation (RPN) tokens using the FormulaParser class. (If you don't know what RPN tokens are, now is a good time to read Anthony Stone's description of RPN.)

The big picture

RPN tokens are mapped to Eval classes. (The class hierarchy for the Evals is best understood if you view it in a class diagram viewer.) Depending on the type of RPN token (also called Ptgs henceforth since that is what the FormulaParser calls the classes), a specific type of Eval wrapper is constructed to wrap the RPN token and is pushed on the stack, unless the Ptg is an OperationPtg. If it is an OperationPtg, an OperationEval instance is created for the specific type of OperationPtg. And depending on how many operands it takes, that many Evals are popped of the stack and passed in an array to the OperationEval instance's evaluate method which returns an Eval of subtype ValueEval. Thus an operation in the formula is evaluated.

Note
An Eval is of subinterface ValueEval or OperationEval. Operands are always ValueEvals, and operations are always OperationEvals.

OperationEval.evaluate(Eval[]) returns an Eval which is supposed to be an instance of one of the implementations of ValueEval. The ValueEval resulting from evaluate() is pushed on the stack and the next RPN token is evaluated. This continues until eventually there are no more RPN tokens, at which point, if the formula string was correctly parsed, there should be just one Eval on the stack — which contains the result of evaluating the formula.

Two special Ptgs — AreaPtg and ReferencePtg — are handled a little differently, but the code should be self explanatory for that. Very briefly, the cells included in AreaPtg and RefPtg are examined and their values are populated in individual ValueEval objects which are set into the implementations of AreaEval and RefEval.

OperationEvals for the standard operators have been implemented and tested.

What functions are supported?

As of July 2021, POI supports 302 built-in functions, see Appendix A for the full list. You can programmatically list supported / unsupported functions using the following helper methods:

import org.apache.poi.ss.formula.ss.formula.WorkbookEvaluator;
// list of functions that POI can evaluate
Collection<String> supportedFuncs = WorkbookEvaluator.getSupportedFunctionNames();
// list of functions that are not supported by POI
Collection<String> unsupportedFuncs = WorkbookEvaluator.getNotSupportedFunctionNames();

I need a function that isn't supported!

If you need a function that POI doesn't currently support, you have two options. You can create the function yourself, and have your program add it to POI at run-time. Doing this will help you get the function you need as soon as possible. The other option is to create the function yourself, and build it into the POI library, possibly contributing the code to the POI project. Doing this will help you get the function you need, but you'll have to build POI from source youself. And if you contribute the code, you'll help others who need the function in the future, because it will already be supported in the next release of POI. The two options require almost identical code, but the process of deploying the function is different. If your function is a User Defined Function, you'll always take the run-time option, as POI doesn't distribute UDFs.

In the sections ahead, we'll implement the Excel SQRTPI() function, first at run-time, and then we'll show how change it to a library-based implementation.

Two base interfaces to start your implementation

All Excel formula function classes implement either the org.apache.poi.hssf.record.formula.functions.Function or the org.apache.poi.hssf.record.formula.functions.FreeRefFunction interface. Function is a common interface for the functions defined in the Binary Excel File Format (BIFF8): these are "classic" Excel functions like SUM, COUNT, LOOKUP, etc. FreeRefFunction is a common interface for the functions from the Excel Analysis ToolPak, for User Defined Functions that you create, and for Excel built-in functions that have been defined since BIFF8 was defined. In the future these two interfaces are expected be unified into one, but for now you have to start your implementation from two slightly different roots.

Which interface to start from?

You are about to implement a function and don't know which interface to start from: Function or FreeRefFunction. You should use Function if the function is part of the Excel BIFF8 definition, and FreeRefFunction for a function that is part of the Excel Analysis ToolPak, was added to Excel after BIFF8, or that you are creating yourself.

You can check the list of Analysis ToolPak functions defined in org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap() to see if the function is part of the Analysis ToolPak. The list of BIFF8 functions is defined as a text file, in the src/resources/main/org/apache/poi/ss/formula/function/functionMetadata.txt file.

You can also use the following code to check which base class your function should implement, if it is not a User Defined function (UDFs must implement FreeRefFunction):

import org.apache.poi.hssf.record.formula.atp.AnalysisToolPak;
if (!AnalysisToolPak.isATPFunction(functionName)){
// the function must implement org.apache.poi.hssf.record.formula.functions.Function
} else {
// the function must implement org.apache.poi.hssf.record.formula.functions.FreeRefFunction
}

Implementing a function.

Here is the fun part: let's walk through the implementation of the Excel function SQRTPI(), which POI doesn not currently support.

AnalysisToolPak.isATPFunction("SQRTPI") returns true, so this is an Analysis ToolPak function. Thus the base interface must be FreeRefFunction. The same would be true if we were implementing a UDF.

Because we're taking the run-time deployment option, we'll create this new function in a source file in our own program. Our function will return an Eval that is either it's proper result, or an ErrorEval that describes the error. All that work is done in the function's evaluate() method:

package ...;
import org.apache.poi.ss.formula.eval.EvaluationException;
import org.apache.poi.ss.formula.eval.ErrorEval;
import org.apache.poi.ss.formula.eval.NumberEval;
import org.apache.poi.ss.formula.eval.OperandResolver;
import org.apache.poi.ss.formula.eval.ValueEval;
import org.apache.poi.ss.formula.functions.FreeRefFunction;
public final class SqrtPi implements FreeRefFunction {
public ValueEval evaluate(ValueEval[] args, OperationEvaluationContext ec) {
ValueEval arg0 = args[0];
int srcRowIndex = ec.getRowIndex();
int srcColumnIndex = ec.getColumnIndex();
try {
// Retrieves a single value from a variety of different argument types according to standard
// Excel rules. Does not perform any type conversion.
ValueEval ve = OperandResolver.getSingleValue(arg0, srcRowIndex, srcColumnIndex);
// Applies some conversion rules if the supplied value is not already a number.
// Throws EvaluationException(#VALUE!) if the supplied parameter is not a number
double arg = OperandResolver.coerceValueToDouble(ve);
// this where all the heavy-lifting happens
double result = Math.sqrt(arg*Math.PI);
// Excel uses the error code #NUM! instead of IEEE NaN and Infinity,
// so when a numeric function evaluates to Double.NaN or Double.Infinity,
// be sure to translate the result to the appropriate error code
if (Double.isNaN(result) || Double.isInfinite(result)) {
throw new EvaluationException(ErrorEval.NUM_ERROR);
}
return new NumberEval(result);
} catch (EvaluationException e){
return e.getErrorEval();
}
}
}

If our function had been one of the BIFF8 Excel built-ins, it would have been based on the Function interface instead. There are sub-interfaces of Function that make life easier when implementing numeric functions or functions with a small, fixed number of arguments:

  • org.apache.poi.hssf.record.formula.functions.NumericFunction
  • org.apache.poi.hssf.record.formula.functions.Fixed0ArgFunction
  • org.apache.poi.hssf.record.formula.functions.Fixed1ArgFunction
  • org.apache.poi.hssf.record.formula.functions.Fixed2ArgFunction
  • org.apache.poi.hssf.record.formula.functions.Fixed3ArgFunction
  • org.apache.poi.hssf.record.formula.functions.Fixed4ArgFunction

Since SQRTPI() takes exactly one argument, we would start our implementation from Fixed1ArgFunction. The differences for a BIFF8 Fixed1ArgFunction are pretty small:

package ...;
import org.apache.poi.ss.formula.eval.EvaluationException;
import org.apache.poi.ss.formula.eval.ErrorEval;
import org.apache.poi.ss.formula.eval.NumberEval;
import org.apache.poi.ss.formula.eval.OperandResolver;
import org.apache.poi.ss.formula.eval.ValueEval;
import org.apache.poi.ss.formula.functions.Fixed1ArgFunction;
public final class SqrtPi extends Fixed1ArgFunction {
public ValueEval evaluate(int srcRowIndex, int srcColumnIndex, ValueEval arg0) {
try {
...
}
}

Now when the implementation is ready we need to register it with the formula evaluator. This is the same no matter which kind of function we're creating. We simply add the following line to the program that us using POI:

WorkbookEvaluator.registerFunction("SQRTPI", SqrtPi);

Voila! The formula evaluator now recognizes SQRTPI()!

Moving the function into the library

If we choose instead to implement our function as part of the POI library, the code is nearly identical. All POI functions are part of one of two Java packages: org.apache.poi.ss.formula.functions for BIFF8 Excel built-in functions, and org.apache.poi.ss.formula.atp for Analysis ToolPak functions. The function still needs to implement the appropriate base class, just as before. To implement our SQRTPI() function in the POI library, we need to move the source code to poi/src/main/java/org/apache/poi/ss/formula/atp/SqrtPi.java in the POI source code, change the package statement, and add a singleton instance:

package org.apache.poi.ss.formula.atp;
...
public final class SqrtPi implements FreeRefFunction {
public static final FreeRefFunction instance = new SqrtPi();
private SqrtPi() {
// Enforce singleton
}
...
}

If our function had been one of the BIFF8 Excel built-ins, we would instead have moved the source code to poi/src/main/java/org/apache/poi/ss/formula/functions/SqrtPi.java in the POI source code, and changed the package statement to:

package org.apache.poi.ss.formula.functions;

POI library functions are registered differently from run-time-deployed functions. Again, the techniques differ for the two types of library functions (remembering that POI never releases the third type, UDFs). For our Analysis ToolPak function, we have to update the list of functions in org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap():

...
private Map<String, FreeRefFunction> createFunctionsMap() {
Map<String, FreeRefFunction> m = new HashMap<>(114);
...
r(m, "SQRTPI", SqrtPi.instance);
...
}
...

If our function had been one of the BIFF8 Excel built-ins, the registration instead would require updating an entry in the formula-function table, poi/src/main/resources/org/apache/poi/ss/formula/function/functionMetadata.txt:

...
#Columns: (index, name, minParams, maxParams, returnClass, paramClasses, isVolatile, hasFootnote )
...
359 SQRTPI 1 1 V V
...

and also updating the list of function implementation list in org.apache.poi.ss.formula.eval.FunctionEval.produceFunctions():

...
private static Function[] produceFunctions() {
...
retval[359] = new SqrtPi();
...
}
...

Floating Point Arithmetic in Excel

Excel uses the IEEE Standard for Double Precision Floating Point numbers except two cases where it does not adhere to IEEE 754:

  1. Positive and Negative Infinities: Infinities occur when you divide by 0. Excel does not support infinities, rather, it gives a #DIV/0! error in these cases.
  2. Not-a-Number (NaN): NaN is used to represent invalid operations (such as infinity/infinity, infinity-infinity, or the square root of -1). NaNs allow a program to continue past an invalid operation. Excel instead immediately generates an error such as #NUM! or #DIV/0!.

Be aware of these two cases when saving results of your scientific calculations in Excel: “where are my Infinities and NaNs? They are gone!”

Testing Framework

Automated testing of the implemented Function is easy. The source code for this is in the file: org.apache.poi.hssf.record.formula.GenericFormulaTestCase.java. This class has a reference to the test xls file (not a test xls, the test xls :) ) which may need to be changed for your environment. Once you do that, in the test xls, locate the entry for the function that you have implemented and enter different tests in a cell in the FORMULA row. Then copy the "value of" the formula that you entered in the cell just below it (this is easily done in excel as: [copy the formula cell] > [go to cell below] > Edit > Paste Special > Values > "ok"). You can enter multiple such formulas and paste their values in the cell below and the test framework will automatically test if the formula evaluation matches the expected value (Again, hard to put in words, so if you will, please take time to quickly look at the code and the currently entered tests in the patch attachment "FormulaEvalTestData.xls" file).

Note
This style of testing appears to have been abandoned. This section needs to be completely rewritten.

Appendix A — Functions supported by POI

Functions supported by POI (as of July 2021)

ABS
ABSREF
ACOS
ACOSH
ADDRESS
AND
APP.TITLE
AREAS
ARGUMENT
ASC
ASIN
ASINH
ATAN
ATAN2
ATANH
AVEDEV
AVERAGE
AVERAGEA
BETADIST
BETAINV
BINOMDIST
BIN2DEC
CALL
CEILING
CELL
CHAR
CHIDIST
CHIINV
CHITEST
CHOOSE
CLEAN
CODE
COLUMN
COLUMNS
COMBIN
COMPLEX
CONCAT
CONCATENATE
CONFIDENCE
CORREL
COS
COSH
COUNT
COUNTA
COUNTBLANK
COUNTIF
COUNTIFS
COVAR
CRITBINOM
DATE
DATEDIF
DATESTRING
DAVERAGE
DAY
DAYS360
DB
DBCS
DCOUNT
DCOUNTA
DDB
DEC2BIN
DEC2HEX
DEGREES
DELTA
DEVSQ
DGET
DMAX
DMIN
DOLLAR
DPRODUCT
DSTDEV
DSTDEVP
DSUM
DVAR
DVARP
EDATE
ENABLE.TOOL
END.IF
EOMONTH
ERROR
ERROR.TYPE
EVALUATE
EVEN
EXACT
EXEC
EXP
EXPONDIST
FACT
FACTDOUBLE
FALSE
FDIST
FIND
FINDB
FINV
FISHER
FISHERINV
FIXED
FLOOR
FORECAST
FREQUENCY
FTEST
FV
GAMMADIST
GAMMAINV
GAMMALN
GEOMEAN
GET.CELL
GET.DOCUMENT
GET.WINDOW
GET.WORKBOOK
GET.WORKSPACE
GETPIVOTDATA
GOTO
GROWTH
HARMEAN
HEX2DEC
HLOOKUP
HOUR
HYPERLINK
HYPGEOMDIST
IF
IFS
IFERROR
IFNA
IMAGINARY
IMREAL
INDEX
INDIRECT
INFO
INT
INTERCEPT
IPMT
IRR
ISBLANK
ISERR
ISERROR
ISEVEN
ISLOGICAL
ISNA
ISNONTEXT
ISNUMBER
ISODD
ISPMT
ISREF
ISTEXT
JIS
KURT
LARGE
LAST.ERROR
LEFT
LEFTB
LEN
LENB
LINEST
LINEST
LN
LOG
LOG10
LOGEST
LOGEST
LOGINV
LOGNORMDIST
LOOKUP
LOWER
MATCH
MAX
MAXA
MDETERM
MEDIAN
MID
MIDB
MIN
MINA
MINUTE
MINVERSE
MIRR
MMULT
MOD
MODE
MONTH
MROUND
N
NA
NEGBINOMDIST
NETWORKDAYS
NORMDIST
NORMINV
NORMSDIST
NORMSINV
NOT
NOW
NPER
NPV
NUMBERSTRING
OCT2DEC
ODD
OFFSET
OR
PEARSON
PERCENTILE
PERCENTRANK
PERMUT
PHONETIC
PI
PMT
POISSON
POWER
PPMT
PRESS.TOOL
PROB
PRODUCT
PROPER
PV
QUARTILE
QUOTIENT
RADIANS
RAND
RANDBETWEEN
RANK
RATE
REGISTER.ID
RELREF
REPLACEB
REPLACE
REPT
RETURN
RIGHT
RIGHTB
ROMAN
ROUND
ROUNDDOWN
ROUNDUP
ROW
ROWS
RSQ
SAVE.TOOLBAR
SEARCH
SEARCHB
SECOND
SIGN
SIN
SINGLE
SINH
SKEW
SLN
SLOPE
SMALL
SQRT
STANDARDIZE
STDEV
STDEVA
STDEVP
STDEVPA
STEP
STEYX
SUBSTITUTE
SUBTOTAL
SUM
SUMIF
SUMIFS
SUMPRODUCT
SUMSQ
SUMX2MY2
SUMX2PY2
SUMXMY2
SYD
T
TAN
TANH
TDIST
TEXT
TEXTJOIN
TIME
TIMEVALUE
TINV
TODAY
TRANSPOSE
TREND
TRIM
TRIMMEAN
TRUE
TRUNC
TTEST
TYPE
UPPER
USDOLLAR
VALUE
VAR
VARA
VARP
VARPA
VDB
VLOOKUP
WEEKDAY
WEEKNUM
WEIBULL
WINDOW.TITLE
WORKDAY
YEAR
YEARFRAC
YEN
ZTEST

by Amol Deshmukh, Yegor Kozlov