The best way to install the
bert-as-service is via pip. Note that the server and client can be installed separately or even on different machines:
pip install -U bert-serving-server bert-serving-client
The server MUST be running on Python >= 3.5 with Tensorflow >= 1.10 (one-point-ten). Again, the server does not support Python 2!
The client can be running on both Python 2 and 3.
Download a model listed below, then uncompress the zip file into some folder, say
List of pretrained BERT models released by Google AI:
|BERT-Base, Uncased||12-layer, 768-hidden, 12-heads, 110M parameters|
|BERT-Large, Uncased||24-layer, 1024-hidden, 16-heads, 340M parameters|
|BERT-Base, Cased||12-layer, 768-hidden, 12-heads , 110M parameters|
|BERT-Large, Cased||24-layer, 1024-hidden, 16-heads, 340M parameters|
|BERT-Base, Multilingual Cased (New)||104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters|
|BERT-Base, Multilingual Cased (Old)||102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters|
|BERT-Base, Chinese||Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters|
As an optional step, you can also fine-tune the model on your downstream task.
After installing the server, you should be able to use bert-serving-start CLI as follows:
bert-serving-start -model_dir /tmp/english_L-12_H-768_A-12/ -num_worker=4
This will start a service with four workers, meaning that it can handle up to four concurrent requests. More concurrent requests will be queued in a load balancer.
Below shows what the server looks like when starting correctly:
Alternatively, one can start the BERT Service in a Docker Container:
docker build -t bert-as-service -f ./docker/Dockerfile . NUM_WORKER=1 PATH_MODEL=/PATH_TO/_YOUR_MODEL/ docker run --runtime nvidia -dit -p 5555:5555 -p 5556:5556 -v $PATH_MODEL:/model -t bert-as-service $NUM_WORKER
Now you can encode sentences simply as follows:
from bert_serving.client import BertClient bc = BertClient() bc.encode(['First do it', 'then do it right', 'then do it better'])
It will return a
ndarray, in which each row is the fixed representation of a sentence. You can also let it return a pure python object with type
As a feature of BERT, you may get encodes of a pair of sentences by concatenating them with
bc.encode(['First do it ||| then do it right'])
Below shows what the server looks like while encoding:
One may also start the service on one (GPU) machine and call it from another (CPU) machine as follows:
# on another CPU machine from bert_serving.client import BertClient bc = BertClient(ip='xx.xx.xx.xx') # ip address of the GPU machine bc.encode(['First do it', 'then do it right', 'then do it better'])
You only need
pip install -U bert-serving-client in this case, the server side is not required.
Want to learn more? Checkout our tutorials below: