Hello!
I am working on a deep learning project using pytorch lightning.
I want to run the training on multiple nodes with multiple GPUS on each.
I follow this tutorial. It is said to run this script
python -m torch.distributed.run
--nnodes=2 # number of nodes you'd like to run with
--master_addr <MASTER_ADDR>
--master_port <MASTER_PORT>
--node_rank <NODE_RANK>
train.py (--arg1 ... train script args...)
on each of the nodes. So I have the question: how I can determine the MASTER_ADDR and MASTER_PORT?