REST server and data probe are the two runtime components that you need to run when using NLPCraft. Data probes are used to deploy and host data model, while REST server (or a cluster of servers) is used to accept client REST call and route them to the data models via data probes.
It's important to remember why REST server is a separate component from a data probe. While a typical deployment would have only one REST server (or a cluster of REST servers behind a single load balancer), there are maybe multiple data probes hosting different data models deployed in different physical locations, managed through different life cycles and requiring different security and network configurations.
Moreover, REST server is a heavy and resource consuming component that is built around Apache Ignite distributing in-memory computing capabilities - while the data probe is a lightweight data model container. During the development and testing of data models, the developers need to frequently redeploy data models by restarting the data probe. If the REST server and the data probe would be one component - this process would be very inefficient.
Configuration
Both REST server and the data probe can share their configuration file or be configured individually. Read more about this in configuration section.
Binary NLPCraft ZIP download comes with a single executable JAR file that includes all necessary dependencies (except for examples): build/apache-nlpcraft-incubating-0.9.0-all-deps.jar
. This single all-inclusive JAR file can be used to start any NLPCraft runtime components as standard Java applications:
Note that if you downloaded the source ZIP you need to run mvn clean package -P examples
to get the apache-nlpcraft-incubating-0.9.0-all-deps.jar
file. It will be located in nlpcraft/target
sub-folder.
If you downloaded binary release the examples JARs are pre-built and shipped within it. They are located in build/nlpcraft-examples/xxx
folder for each xxx
example. If you downloaded a source release you will need to run mvn clean package -P examples
and examples JARs will be located in each individual module under its target
sub-folder.
REST server accepts client REST calls and routes them to the data model hosted by data probes. REST server can be started in different ways:
$ bin/nlpcraft.sh start-server
NOTES:
nlpcraft.sh
for and nlpcraft.cmd
for .bin/nlpcraft.sh help --cmd=start-server
to get a full help on this command.If using executable JAR:
$ java -Xms1024m -jar apache-nlpcraft-incubating-0.9.0-all-deps.jar -server
If specifying additional classpath components and need -cp
parameter:
$ java -Xms1024m -cp apache-nlpcraft-incubating-0.9.0-all-deps.jar org.apache.nlpcraft.NCStart -server
NOTES:
apache-nlpcraft-incubating-0.9.0-all-deps.jar
file.org.apache.nlpcraft.NCStart
is a common entry point for all NLPCraft runtime components and can be used to start REST server from IDE.Parameters:
-server
-config=path
nlpcraft.conf
configuration file in the same directory as apache-nlpcraft-incubating-0.9.0-all-deps.jar
file. If the configuration file has different name or in different location use -config=path
parameter where path
is an absolute path to the configuration file. Note that the server and the data probe can use the same file for their configuration (like the default nlpcraft.conf
contains configuration for both the server and the data probe).-igniteConfig=path
ignite.xml
configuration file in the same directory as apache-nlpcraft-incubating-0.9.0-all-deps.jar
file. If the configuration file has different name or in different location use -igniteConfig=path
parameter where path
is an absolute path to the Ignite configuration file.VM Options:
NLPCraft REST server uses Apache Ignite 2.x as its distributed in-memory computing plane. Apache Ignite requires the following additional JVM options to be used when running Apache Ignite 2.x on JDK 11:
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --illegal-access=permit
NOTES:
nlpcraft.{sh|cmd}
script automatically uses these options for start-server
command.nlpcraft.{sh|cmd}
script to manage REST server lifecycle.If Docker image is available for given version you can start REST server as follows:
$ docker run -m 8G -p 8081:8081 -p 8201:8201 -p 8202:8202 nlpcraftserver/server:0.9.0
Make sure to allocate enough memory for server JVM using -Xms
JVM option, i.e. -Xms1024m
. Many 3rd party NLP engines like Stanford CoreNLP are very memory intensive and may require several GBs of JVM heap allocated depending on the models used. Note that when server JVM has insufficient heap memory the Apache Ignite may throw the following warning logs:
Jul-22 13:27:56 [INFO ] ... Jul-22 13:28:08 [WARN ] Possible too long JVM pause: 11364 milliseconds. Jul-22 13:28:11 [INFO ] ...
NOTES:
nlpcraft.{sh|cmd}
script automatically uses -Xms1024m
for start-server
command.The abnormally long GC pauses (over 5s) can be caused by the excessive memory swapping performed by OS due to insufficient JVM heap memory.
Data probes are used to deploy and host data mode, and can also be started in several ways:
$ bin/nlpcraft.sh start-probe --cp=/path/to/model/classes # Use default configuration. $ bin/nlpcraft.sh start-probe --cp=/path/to/model/classes --mdls=com.package.MyModel > restart-probe # Restart the probe with the last set of parameters in REPL mode.
NOTES:
nlpcraft.sh
for and nlpcraft.cmd
for .--cp
parameter must provide additional JVM classpath for models to deploy in this probe.--mdls
parameter can be used to specify a one or more specific models to deploy if more than one model is available.bin/nlpcraft.sh help --cmd=start-probe
to get a full help on this command.If using executable JAR:
$ java -jar apache-nlpcraft-incubating-0.9.0-all-deps.jar -probe
If specifying additional classpath components and need -cp
parameter:
java -cp apache-nlpcraft-incubating-0.9.0-all-deps.jar:/my/project/classes org.apache.nlpcraft.NCStart -probe -config=/my/project/probe.conf
NOTES:
/my/project
directory contains user-defined model implementationapache-nlpcraft-incubating-0.9.0-all-deps.jar
file.org.apache.nlpcraft.NCStart
is a common entry point for all NLPCraft runtime components.org.apache.nlpcraft.NCStart
should be used to star data probe from IDE.Parameters:
-probe
-config=path
Optional parameter to provide probe configuration file path. Data probe will automatically look for nlpcraft.conf
configuration file in the same directory as apache-nlpcraft-incubating-0.9.0-all-deps.jar
file. If the configuration file has different name or in different location use -config=path
parameter where path
is an absolute path to the data probe configuration file. Note that the server and the data probe can use the same file for their configuration (like the default nlpcraft.conf
contains configuration for both the server and the data probe).
Both REST server and the data probe use Typesafe Config for their configuration:
build/nlpcraft.conf
file.server.conf
and then nlpcraft.conf
configuration file.probe.conf
and then nlpcraft.conf
configuration file.nlpcraft.server
or nlpcraft.probe
sub-section. By default, when REST server or data probe start they look for nlpcraft.conf
configuration file in the same directory as apache-nlpcraft-incubating-0.9.0-all-deps.jar
file and the on their classpath. You can change this behavior with -config=path
parameter.
Configuration Example
Default configuration is available in build/nlpcraft.conf
file and it is extensively documented including all optional parameters and default values.
Default configuration is available in build/nlpcraft.conf
file and it is extensively documented. It has subsections for the server and probe configuration. You can also separate server and probe configurations into their own separate files. While server and probe can use any file name for their configuration files, the server looks for server.conf
file by default, and the probe looks for probe.conf
file before searching for nlpcraft.conf
file. Each such file would have a subsection of configuration for either server or data probe.
Server configuration file (e.g. server.conf
):
nlpcraft { server { ... } }
Probe configuration file (e.g. probe.conf
):
nlpcraft { probe { ... } }
While you can change configuration file or files for your own needs (and use -config=...
parameter described above to provide path to that file) it is often more convenient to use the default configuration file and change one or two properties in it. You can accomplish this by using standard HOCON overriding via environment variables:
-Dconfig.override_with_env_vars=true
which will instruct configuration framework to look for external overrides.x.y.z
set the overriding environment variable CONFIG_FORCE_x_y_z=some_value
Consider the following snippet of NLPCraft configuration:
nlpcraft { probe { models = "com.nlp.MyModel" } server { lifecycle = "org.apache.nlpcraft.server.lifecycle.opencensus.NCJaegerExporter" rest { host = "0.0.0.0" port = 8081 apiImpl = "org.apache.nlpcraft.server.rest.NCBasicRestApi" } } }
You can override these properties with the following environment variables:
CONFIG_FORCE_nlpcraft_server_rest_host=1.2.3.4
CONFIG_FORCE_nlpcraft_server_lifecycle="org.nlp.Lifecycle1, org.nlp.Lifecycle1"
CONFIG_FORCE_nlpcraft_probe_models="com.nlp.MyModel, com.nlp.AnotherModel"
Note that all examples that come with NLPCraft have instructions that use environment variable overriding for running their data probes. They use default nlpcraft.conf
file and override one nlpcraft.probe.models
property (see above) to specify what model the data probe needs to deploy.
Both NLPCraft server and probe use ANSI coloring via ANSI escape sequences for their log output by default. ANSI coloring provides easer console log comprehension and modern esthetics:
However, there are cases when either specific console does not support ANSI escape sequences, or specific color schema isn't suitable or log being redirected to a file or piped to downstream system. In these cases you need to disable ANSI coloring to avoid polluting log with unprocessed ANSI escape codes.
You can disable ANSI coloring in either server, probe or both by supplying the following system property to JVM process: -DNLPCRAFT_ANSI_COLOR_DISABLED=true
It is a good practice to run units tests during routine builds using Maven (or others CI toolchains). To test data models you need to have a running server and then start one or more data probes with models you want to test. While doing this from IDE can be trivial enough, doing this from Maven can be tricky.
The challenge is that from the Maven build you need to start the server, wait til its fully started and initialized, and only then start issuing REST calls, start data probes or run tests that use embedded probes. When done manually (e.g. from IDE) you can visually observe when the server finished its startup and then manually launch the tests. In Maven, however, you need to use a special plugin to accomplish the same in automated fashion.
Technically, when a data probe starts up it will initialize, load the models, and will automatically wait for the server to get online if it isn't yet (as well as periodically check for it). Once server is online the data probe will automatically connect to it. However, if the unit tests don't use data probe and only issue REST calls then these tests have to somehow wait for the server to get online.
To overcome this challenge you can use process-exec-maven-plugin
Maven plugin.
To get around this problem NLPCraft uses process-exec-maven-plugin
Maven plugin in its own build. This plugin allows to start the external process and use configured URL endpoint to check whether or not the external process has fully started. This works perfect with NLPCraft server health check REST call. The plugin can be configured in the following way for your own project (taken directly from NLPCraft pom.xml
):
<plugin> <groupId>com.bazaarvoice.maven.plugins</groupId> <artifactId>process-exec-maven-plugin</artifactId> <version>0.9</version> <executions> <execution> <id>pre-integration-test</id> <phase>pre-integration-test</phase> <goals> <goal>start</goal> </goals> <configuration> <name>server</name> <healthcheckUrl>http://localhost:8081/api/v1/health</healthcheckUrl> <waitAfterLaunch>180</waitAfterLaunch> <processLogFile>${project.build.directory}/server.log</processLogFile> <arguments> <argument>java</argument> <argument>-Xmx4G</argument> <argument>-Xms4G</argument> <argument>--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED</argument> <argument>--add-opens=java.base/sun.nio.ch=ALL-UNNAMED</argument> <argument>--add-opens=java.base/java.nio=ALL-UNNAMED</argument> <argument>--add-opens=java.base/java.io=ALL-UNNAMED</argument> <argument>--add-opens=java.base/java.util=ALL-UNNAMED</argument> <argument>--add-opens=java.base/java.lang=ALL-UNNAMED</argument> <argument>--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED</argument> <argument>--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED</argument> <argument>--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED</argument> <argument>--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED</argument> <argument>--illegal-access=permit</argument> <argument>-DNLPCRAFT_ANSI_COLOR_DISABLED=true</argument> <argument>-Djdk.tls.client.protocols=TLSv1.2</argument> <argument>-jar</argument> <argument>${project.build.directory}/${nlpcraft.all.deps.jar}</argument> <argument>-server</argument> </arguments> </configuration> </execution> <execution> <id>stop-all</id> <phase>post-integration-test</phase> <goals> <goal>stop-all</goal> </goals> </execution> </executions> </plugin>
NOTES:
/health
localhost REST call for that.When running both server and the data probe(s) from the Maven build it is important to avoid interleaving logs from the server and the probe. Such interleaving will make the combined log in Maven unreadable and can cause console malfunction due to mixed up ANSI escape codes. It is idiomatic in such cases to:
-DNLPCRAFT_ANSI_COLOR_DISABLED=true
process-exec-maven-plugin
.