FAMO: Fast Adaptive Multitask Optimization

Official implementation of theFAMOoptimizer for multitask learning (NeurIPS 2023).[Paper]

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe under-optimization of certain tasks. Previous approaches that manipulate task gradients for a more balanced loss decrease require storing and computing all task gradients ($\mathcal{O}(K)$space and time where K is the number of tasks), limiting their use in large-scale scenarios. In this work, we introduce Fast Adaptive Multitask Optimization (FAMO), a dynamic weighting method that decreases task losses in a balanced way using$\mathcal{O}(1)$space and time. We conduct an extensive set of experiments covering multi-task supervised and reinforcement learning problems. Our results indicate thatFAMOachieves comparable or superior performance to state-of-the-art gradient manipulation techniques while offering significant improvements in space and computational efficiency.

update 2023/11/11:FAMO has been used inMFTCoderto boost the fine-tuning performance of large language models!

Top left:The loss landscape, and individual task losses of a toy 2-task learning problem (★ represents the minimum of task losses).Top right:the runtime of different MTL methods for 50000 steps. Bottom: the loss trajectories of different MTL methods. ADAM fails in 1 out of 5 runs to reach the Pareto front due to CG.FAMOdecreases task losses in a balanced way and is the only method matching the$\mathcal{O}(1)$space/time complexity of ADAM.

I. FAMO Example Usage

For the convenience of potential users ofFAMO,we provide a simple example infamo.pyso that users can easily adaptFAMOto their applications. The code requires installation oftorch,which can be installed via the setup in the next section. Check the file and simply run

python famo.py

II. Image-to-Image Prediction

Setup environment

Create the conda environment and install torch

conda create -n mtl python=3.9.7
conda activate mtl
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

Install the repo:

git clone https://github.com/Cranial-XIX/FAMO.git
cdFAMO
pip install -e.

Download dataset

We follow theMTANpaper. The datasets could be downloaded fromNYU-v2andCityScapes.To download the CelebA dataset, please refer to thislink.The dataset should be put underexperiments/EXP_NAME/dataset/folder whereEXP_NAMEis chosen fromnyuv2, cityscapes, celeba.Note thatquantum_chemistrywill download the data automatically.

The file hierarchy should look like

FAMO
└─ experiments
└─ utils.py (for argument parsing)
└─ nyuv2
└─ dataset (the dataset folder containing the MTL data)
└─ trainer.py (the main file to run the training)
└─ run.sh (the command to reproduce FAMO's results)
└─ cityscapes
└─ dataset (the dataset folder containing the MTL data)
└─ trainer.py (the main file to run the training)
└─ run.sh (the command to reproduce FAMO's results)
└─ quantum_chemistry
└─ dataset (the dataset folder containing the MTL data)
└─ trainer.py (the main file to run the training)
└─ run.sh (the command to reproduce FAMO's results)
└─ celeba
└─ dataset (the dataset folder containing the MTL data)
└─ trainer.py (the main file to run the training)
└─ run.sh (the command to reproduce FAMO's results)
└─ methods
└─ weight_methods.py (the different MTL optimizers)

Run experiment

To run experiments, go to the relevant folder with nameEXP_NAME

cdexperiments/EXP_NAME
bash run.sh

You can check therun.shfor details about training withFAMO.

FollowingNashMTL,we also support experiment tracking withWeights & Biaseswith two additional parameters:

python trainer.py --method=famo --wandb_project=<project-name>--wandb_entity=<entity-name>

MTL methods

We support the following MTL methods with a unified API. To run experiment with MTL methodXsimply run:

python trainer.py --method=X

Method (code name)	Paper (notes)
FAMO (`famo`)	Fast Adaptive Multitask Optimization
Nash-MTL (`nashmtl`)	Multi-Task Learning as a Bargaining Game
CAGrad (`cagrad`)	Conflict-Averse Gradient Descent for Multi-task Learning
PCGrad (`pcgrad`)	Gradient Surgery for Multi-Task Learning
IMTL-G (`imtl`)	Towards Impartial Multi-task Learning
MGDA (`mgda`)	Multi-Task Learning as Multi-Objective Optimization
DWA (`dwa`)	End-to-End Multi-Task Learning with Attention
Uncertainty weighting (`uw`)	Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
Linear scalarization (`ls`)	- (equal weighting)
Scale-invariant baseline (`scaleinvls`)	- (see Nash-MTL paper for details)
Random Loss Weighting (`rlw`)	A Closer Look at Loss Weighting in Multi-Task Learning

III. Multitask Reinforcement Learning (MTRL)

FollowingCAGrad,the MTRL experiments are conducted onMetaworldbenchmarks. In particular, we follow themtrlcodebase and the experiment setup inthis paper.

Installmtrlaccording to the instructions.
Git cloneMetaworldand change tod9a75c451a15b0ba39d8b7a8b6d18d883b8655d8commit (Feb 26, 2021). Install metaworld accordingly.
Copy themtrl_filesfolder under mtrl of this repo to the cloned repo of mtrl. Then

cd PATH_TO_MTRL/mtrl_files/ && chmod +x mv.sh &&./mv.sh

Then follow therun.shscript to run experiments (We are still testing the results but the code should be runnable).

IV. Citation

This repo is built uponCAGradandNashMTL. If you findFAMOto be useful in your own research, please consider citing the following papers:

@misc{liu2023famo,
title={FAMO: Fast Adaptive Multitask Optimization},
author={Bo Liu and Yihao Feng and Peter Stone and Qiang Liu},
year={2023},
eprint={2306.03792},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

@article{liu2021conflict,
title={Conflict-Averse Gradient Descent for Multi-task Learning},
author={Liu, Bo and Liu, Xingchao and Jin, Xiaojie and Stone, Peter and Liu, Qiang},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}

@article{navon2022multi,
title={Multi-Task Learning as a Bargaining Game},
author={Navon, Aviv and Shamsian, Aviv and Achituve, Idan and Maron, Haggai and Kawaguchi, Kenji and Chechik, Gal and Fetaya, Ethan},
journal={arXiv preprint arXiv:2202.01017},
year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
experiments		experiments
methods		methods
misc		misc
mtrl_files		mtrl_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
famo.py		famo.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAMO: Fast Adaptive Multitask Optimization

I. FAMO Example Usage

II. Image-to-Image Prediction

Setup environment

Download dataset

Run experiment

MTL methods

III. Multitask Reinforcement Learning (MTRL)

IV. Citation

About

Releases

Packages

Contributors 2

Languages

License

Cranial-XIX/FAMO

Folders and files

Latest commit

History

Repository files navigation

FAMO: Fast Adaptive Multitask Optimization

I. FAMO Example Usage

II. Image-to-Image Prediction

Setup environment

Download dataset

Run experiment

MTL methods

III. Multitask Reinforcement Learning (MTRL)

IV. Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages