Jump to content

Cache strategy

From Meta, a Wikimedia project coordination wiki

The description following is from March 2005. By 2006 the following changes had happened:

  • More database servers, now with 16GB of RAM
  • More memcached
  • Server names and roles constantly change, those mentioned are the ones that happened to be in use at the time.

Wikimedia uses several levels of caching to improve site performance:

  • Squid cacheservers handle about 78% of requests, almost all which are made by viewers who are not logged in to the site. During load surges from media mentions, the Squids handle almost all of the traffic. First use of Squid was in 2 February 2004 (seeCache bugsfor early issues).wikitech:MediaWiki cachinghas more up-to-date notes on Wikimedia's squid caching.
  • Memcachedis used to save web pages which have been parsed, so that step doesn't need to be carried out repeatedly. This adds about 7% to the overall cache hit rate for pages. As of 19 September 2004 34 instances each of 180MB are in use: 12 on Yongle, 6 on Bart and Bayle, 2 on Isidore and Moreri and one each on dalembert, Tingxi, Alrazi, Friedrich, Harris and Avicenna for a total of 6120MB. It also caches login session IDs and user interface text in the various languages.
  • APCon the Apache web servers is used forPHP cachingto improve the performance of the web servers. PHP is normally compiled into bytecode when run, this saves much of the CPU overhead of continually recompiling the same code.
  • The database servers have large caches:
    • Arielhas 8GB of RAM total and 5.8GB is used for InnoDB caching, giving a hit rate over 99%. The remaining RAM is used for in-memory sorting, temporary tables used in SQL query processing and buffering of non-InnoDB table types.
    • Suda,the former master and now fallback master and general database slave has 4GB available for caching, as does Bacon, a query slave.
  • Arielhas a RAID controller with a 64MB battery-backed cache to help disk performance. This is particularly significant for database transaction log entries, which are very regularly written to disk. This cache allows the RAID controller to say that a write has completed without having to wait for the disks to actually write the data.

Load balancing

[edit]

In June 2004 the load was balanced with:

  • Round robin DNSdistributed page requests evenly to one of threeSquidcache servers.
  • Squid cache servers used response time measurements to distribute page requests between seven web servers. In addition, the Squid serverscachedpages and delivered about 75% of all pages without ever asking a web server for help.
  • ThePHPscripts which run the web servers distribute load to one of severaldatabase serversdepending on the type of request, with updates going to a master database server and some database queries going to one or more slave database servers.

See also

[edit]