Skip to content
View simon-mo's full-sized avatar
:shipit:
:shipit:

Block or report simon-mo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
simon-mo/README.md

👋 I'm Simon.

Currently, I'm a PhD student at Berkeley Sky Computing Lab for machine learning system and cloud infrastructures. I am advised by Prof. Joseph Gonzalez and Prof. Ion Stoica.

My latest focus is building an end to end stack for LLM inference on your own infrastructure:

  • vLLM runs LLM inference efficiently.

Previous exploration includes:

  • Conex: builds, push, and pull containers fast.
  • SkyATC: orchestrate LLMs in multi-cloud and scaling them to zero.

I previously work on Model Serving System @anyscale.

  • Ray takes your Python code and scale it to thousands of cores.
  • Ray Serve empowers data scientists to own their end-to-end inference APIs.

Before Anyscale, I was a undergraduate researcher @ucbrise.

Publications:

Reach out to me: simon.mo at hey.com

Pinned Loading

  1. vllm-project/vllm vllm-project/vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 29.8k 4.5k