Jump to content

Query string

From Wikipedia, the free encyclopedia

Aquery stringis a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

Anaddress baronGoogle Chromeshowing aURLwith the query stringtitle=Query_string&action=edit

A web server can handle aHypertext Transfer Protocol(HTTP) request either by reading a file from itsfile systembased on theURLpath or by handling the request using logic that is specific to the type of resource. In cases where special logic is invoked, the query string will be available to that logic for use in its processing, along with the path component of the URL.

Structure

[edit]

A typical URL containing a query string is as follows:

https://example /over/there?name=ferret

When a server receives a request for such a page, it may run a program, passing the query string, which in this case isname=ferret,unchanged to the program. The question mark is used as a separator, and is not part of the query string.[1][2]

Web frameworksmay provide methods for parsing multiple parameters in the query string, separated by some delimiter.[3]In the example URL below, multiple query parameters are separated by theampersand,"&":

https://example /path/to/page?name=ferret&color=purple

The exact structure of the query string is not standardized. Methods used to parse the query string may differ between websites.

A link in a web page may have a URL that contains a query string.HTMLdefines three ways a user agent can generate the query string:

  • anHTML formvia the<form>...</form>element
  • aserver-side image mapvia theismapattribute on the<img>element with an<img ismap>construction
  • an indexed search via the now deprecated<isindex>element

Web forms

[edit]

One of the original uses was to contain the content of anHTML form,also known as web form. In particular, when a form containing the fieldsfield1,field2,field3is submitted, the content of the fields is encoded as a query string as follows:

field1=value1&field2=value2&field3=value3...

  • The query string is composed of a series of field-value pairs.
  • Within each pair, the field name and value are separated by anequals sign,"=".
  • The series of pairs is separated by theampersand,"&"(semicolons";"are not recommended by theW3Canymore, see below).

While there is no definitive standard, mostweb frameworksallow multiple values to be associated with a single field (e.g.field1=value1&field1=value2&field2=value3).[4][5]

For eachfieldof the form, the query string contains a pairfield=value.Web forms may include fields that are not visible to the user; these fields are included in the query string when the form is submitted.

This convention is aW3Crecommendation.[3]In the recommendations of 1999, W3C recommended that all web servers supportsemicolonseparators in addition toampersandseparators[6]to allowapplication/x-www-form-urlencodedquery strings in URLs within HTML documents without having to entity escape ampersands. Since 2014, W3C recommends to use onlyampersandas query separator.[7]

The form content is only encoded in the URL's query string when the form submission method isGET.The same encoding is used by default when the submission method isPOST,but the result is submitted as theHTTP requestbody rather than being included in a modified URL.[8]

[edit]

Beforeformswere added to HTML, browsers rendered the –<isindex>element as a single-line text-input control. The text entered into this control was sent to the server as a query string addition to aGETrequest for the base URL or another URL specified by theactionattribute.[9]This was intended to allow web servers to use the provided text as query criteria so they could return a list of matching pages.[10]

When the text input into the indexed search control is submitted, it is encoded as a query string as follows:

argument1+argument2+argument3...

  • The query string is composed of a series of arguments by parsing the text into words at the spaces.
  • The series is separated by theplus sign,'+'.

Though the<isindex>element is deprecated and most browsers no longer support or render it, there are still some vestiges of indexed search in existence. For example, this is the source of the special handling ofplus sign,'+' within browser URL percent encoding (which today, with the deprecation of indexed search, is all but redundant with%20). Also some web servers supportingCGI(e.g.,Apache) will process the query string into command line arguments if it does not contain anequals sign,'=' (as per section 4.4 of CGI 1.1). Some CGI scripts still depend on and use this historic behavior for URLs embedded in HTML.

URL encoding

[edit]

Somecharacterscannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character#can be used to further specify a subsection (orfragment) of a document. In HTML forms, the character=is used to separate a name from a value. The URI generic syntax usesURL encodingto deal with this problem, while HTML forms make some additional substitutions rather than applying percent encoding for all such characters. SPACE is encoded as '+' or "%20".[11]

HTML 5specifies the following transformation for submitting HTML forms with the "GET" method to a web server. The following is a brief summary of the algorithm:

  • Characters that cannot be converted to the correct charset are replaced with HTMLnumeric character references[12]
  • SPACE is encoded as '+' or '%20'
  • Letters (AZandaz), numbers (09) and the characters '~','-','.' and '_' are left as-is
  • +is encoded by %2B
  • All other characters are encoded as a%HHhexadecimalrepresentation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)

The octet corresponding to the tilde ( "~") is permitted in query strings by RFC3986 but required to be percent-encoded in HTML forms to"%7E".

The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 3986.

Example

[edit]

If aformis embedded in anHTMLpage as follows:

<formaction="/cgi-bin/test.cgi"method="get">
<inputtype="text"name="first"/>
<inputtype="text"name="second"/>
<inputtype="submit"/>
</form>

and the user inserts the strings "this is a field" and "was it clear (already)?" in the twotext fieldsand presses the submit button, the programtest.cgi(the program specified by theactionattributeof theformelementin the above example) will receive the following query string: first=this+is+a+field&second=was+it+clear+%28already%29%3F.

If the form is processed on theserverby aCGIscript,the script may typically receive the query string as anenvironment variablenamedQUERY_STRING.

Tracking

[edit]

A program receiving a query string can ignore part or all of it. If the requested URL corresponds to a file and not to a program, the whole query string is ignored. However, regardless of whether the query string is used or not, the whole URL including it is stored in the serverlog files.

These facts allow query strings to be used to track users in a manner similar to that provided byHTTP cookies.For this to work, every time the user downloads a page, a unique identifier must be chosen and added as a query string to the URLs of all links the page contains. As soon as the user follows one of these links, the corresponding URL is requested to the server. This way, the download of this page is linked with the previous one.

For example, when a web page containing the following is requested:

<ahref="foo.html">see my page!</a>
<ahref="bar.html">mine is better</a>

a unique string, such ase0a72cb2a2c7is chosen, and the page is modified as follows:

<ahref="foo.html?e0a72cb2a2c7">see my page!</a>
<ahref="bar.html?e0a72cb2a2c7">mine is better</a>

The addition of the query string does not change the way the page is shown to the user. When the user follows, for example, the first link, the browser requests the pagefoo.html?e0a72cb2a2c7to the server, which ignores what follows?and sends the pagefoo.htmlas expected, adding the query string to its links as well.

This way, any subsequent page request from this user will carry the same query stringe0a72cb2a2c7,making it possible to establish that all these pages have been viewed by the same user. Query strings are often used in association withweb beacons.

The main differences between query strings used for tracking and HTTP cookies are that:

  1. Query strings form part of the URL, and are therefore included if the user saves or sends the URL to another user; cookies can be maintained across browsing sessions, but are not saved or sent with the URL.
  2. If the user arrives at the same web server by two (or more) independent paths, it will be assigned two different query strings, while the stored cookies are the same.
  3. The user can disable cookies, in which case using cookies for tracking does not work. However, using query strings for tracking should work in all situations.
  4. Different query strings passed by different visits to the page will mean that the pages are never served from the browser (or proxy, if present) cache thereby increasing the load on the web server and slowing down the user experience.

Compatibility issues

[edit]

According to theHTTPspecification:

Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.[13]

If the URL is too long, the web server fails with the414 Request-URI Too LongHTTP status code.

The common workaround for these problems is to usePOSTinstead ofGETand store the parameters in the request body. The length limits on request bodies are typically much higher than those on URL length. For example, the limit on POST size, by default, is 2 MB on IIS 4.0 and 128 KB on IIS 5.0. The limit is configurable on Apache2 using theLimitRequestBodydirective, which specifies the number of bytes from 0 (meaning unlimited) to 2147483647 (2 GB) that are allowed in a request body.[14]

See also

[edit]

References

[edit]
  1. ^T. Berners-Lee; R. Fielding; L. Masinter (January 2005)."RFC 3986"."Syntax Components" (section 3).
  2. ^T. Berners-Lee; R. Fielding; L. Masinter (January 2005)."RFC 3986"."Query" (section 3.4).
  3. ^abForms in HTML documents.W3.org. Retrieved on 2013-09-08.
  4. ^"ServletRequest (Java EE 6 )".docs.oracle.2011-02-10.Retrieved2013-09-08.
  5. ^"uri – Authoritative position of duplicate HTTP GET query keys".Stack Overflow.2013-06-09.Retrieved2013-09-08.
  6. ^Performance, Implementation, and Design Notes.W3.org. Retrieved on 2013-09-08.
  7. ^"4.10 Forms — HTML5".
  8. ^[1],HTML5.2, W3C recommendation, 14 December 2017
  9. ^"<isindex>".HTML (HyperText Markup Language).
  10. ^"HTML/Elements/isindex".W3C Wiki.
  11. ^"HTML URL Encoding Reference".W3Schools.RetrievedMay 1,2013.
  12. ^Theapplication/x-www-form-urlencodedencoding algorithm,HTML5.2, W3C recommendation, 14 December 2017
  13. ^HTTP/1.1 Message Syntax and Routing.ietf.org. Retrieved on 2014-07-31.
  14. ^core – Apache HTTP Server.Httpd.apache.org. Retrieved on 2013-09-08.