• What a web developer needs to know about the HTTP protocol. HTTP protocol rules

    .) It is through the ability to specify how a message is encoded that the client and server can exchange binary data, although this protocol is text.

    Proxy servers

    History of development

    HTTP/0.9

    In addition to the usual GET method, there is also a distinction. Conditional GET requests contain If-Modified-Since, If-Match, If-Range, and similar headers. Partial GETs contain Range in the request. Execution order similar requests defined separately by standards.

    HEAD

    Similar to the GET method, except that there is no body in the server response. The HEAD request is typically used to retrieve metadata, check for the existence of a resource (URL validation), and see if it has changed since it was last accessed.

    Response headers may be cached. If a resource's metadata does not match the corresponding information in the cache, the copy of the resource is marked as out of date.

    POST

    Used to transfer user data to a specified resource. For example, on blogs, visitors can typically enter their comments on posts into an HTML form, after which they are POSTed to the server and placed on the page. In this case, the transmitted data (in the example with blogs, the text of the comment) is included in the body of the request. Similarly, using the POST method, files are usually uploaded to the server.

    Unlike the GET method, the POST method is not considered idempotent, that is, repeating the same POST requests may return different results (for example, after each comment is submitted, one copy of that comment will appear).

    If the execution result is 200 (Ok), a message about the completion of the request should be included in the response body. If a resource has been created, the server SHOULD return a 201 (Created) response with the URI of the new resource in the Location header.

    The server response message to the POST method is not cached.

    PUT

    Used to load the request content to the URI specified in the request. If a resource does not exist at the given URI, the server creates it and returns status 201 (Created). If the resource has been changed, the server returns 200 (Ok) or 204 (No Content). The server MUST NOT ignore invalid Content-* headers sent by the client along with the message. If any of these headers cannot be recognized or are not valid under current conditions, then an error code of 501 (Not Implemented) must be returned.

    The fundamental difference between the POST and PUT methods is the understanding of the purpose of the resource URI. The POST method assumes that the specified URI will be used to process the content sent by the client. By using PUT, the client assumes that the content being downloaded matches the resource located at the given URI.

    Server response messages to the PUT method are not cached.

    PATCH

    Similar to PUT, but applies only to a fragment of the resource.

    DELETE

    Deletes the specified resource.

    TRACE

    Returns the received request so that the client can see what information intermediate servers add or change in the request.

    LINK

    Establishes a connection between the specified resource and others.

    UNLINK

    Removes the connection of the specified resource with others.

    CONNECT

    Converts a request connection into a transparent TCP/IP tunnel, typically to facilitate the establishment of a secure SSL connections via an unencrypted proxy.

    Status Codes

    The status code is part of the first line of the server response. It represents an integer of three Arabic numerals. The first digit indicates the class of the condition. The response code is usually followed by an explanatory phrase separated by a space. English, which explains to the person the reason for this particular answer. Examples:

    201 Webpage Created 403 Access allowed only for registered users 507 Insufficient Storage

    The client learns from the response code about the results of its request and determines what actions to take next. The set of status codes is a standard and they are described in the corresponding RFCs. The introduction of new codes should be made only after agreement with the IETF. The client may not know all status codes, but it must respond according to the class of the code.

    There are currently five classes of status codes.

    1xx Informational (Russian) Informational)

    This class contains codes that inform about the transfer process. In HTTP/1.0, messages with such codes should be ignored. In HTTP/1.1, the client must be prepared to accept this class of messages as a normal response, but does not need to send anything to the server. The messages themselves from the server contain only start line response and, if required, several response-specific header fields. Proxy servers must send such messages further from the server to the client.

    2xx Success (Russian) Success)

    Messages of this class inform about cases of successful acceptance and processing of a client’s request. Depending on the status, the server may also transmit the headers and body of the message.

    3xx Redirection (Russian) Redirection )

    Class 3xx codes tell the client that in order to successfully complete the operation, it is necessary to make another request (usually to a different URI). Of this class, five codes , , , and relate directly to redirections (redirect). The server specifies the address to which the client should make the request in the Location header. However, it is possible to use fragments in the target URI.

    4xx Client Error (Russian) Client error)

    The 4xx code class is intended to indicate errors on the client side. When using all methods except HEAD, the server must return a hypertext explanation to the user in the body of the message.

    To remember the values ​​of codes from 400 to 417, there are illustrative mnemonics techniques

    5xx Server Error (Russian) Server error)

    Codes 5xx are allocated for cases of unsuccessful operation due to the fault of the server. For all situations other than using the HEAD method, the server must include in the body of the message an explanation that the client will display to the user.

    Headings

    Message body

    The HTTP message body (message-body), if present, is used to convey the body of the object associated with the request or response. The message-body differs from the entity-body only when transfer encoding is applied, as indicated by the Transfer-Encoding header field.

    Message-body = entity-body |

    The Transfer-Encoding field must be used to indicate any transfer encoding applied by the application to ensure that the message is transmitted securely and correctly. The Transfer-Encoding field is a property of the message, not the object, and thus can be added or removed by any application in the request/response chain.

    The rules governing whether a message body is acceptable in a message are different for requests and responses.

    The presence of a message body in a request is indicated by adding a Content-Length or Transfer-Encoding header field to the request headers. A message body (message-body) MAY only be added to a request when the request method allows an entity-body.

    Whether or not a message-body is included in the response message depends on both the request method and the response status code. All responses to a request with the HEAD method must not include a message-body, even if entity-header fields are present to make one believe that the entity is present. No responses with status codes 1xx (Informational), 204 (No Content), and 304 (Not Modified) must contain a message-body. All other responses contain a message body, even if it has zero length.

    HTTP Dialog Examples

    Regular GET request

    There are two main types of approvals:

    • Server-managed Server-Driven).
    • Client managed Agent-Driven).

    Both types or each of them separately can be used simultaneously.

    The main protocol specification (RFC 2616) also highlights the so-called transparent negotiation. Transparent Negotiation) as the preferred option for combining both types. The latter mechanism should not be confused with the independent technology Transparent Content Negotiation (TCN, Russian. Transparent content approval , see RFC 2295), which is not part of the HTTP protocol, but can be used with it. Both have significant differences in the principle of operation and the very meaning of the word “transparent”. In the HTTP specification, transparency means that the process is invisible to the client and server, and in TCN technology, transparency means accessibility full list resource options for all participants in the data delivery process.

    Server Managed

    If there are multiple versions of a resource, the server can analyze the client's request headers to produce what it believes is the most appropriate version. The main headers analyzed are Accept, Accept-Charset, Accept-Encoding, Accept-Languages ​​and User-Agent. It is advisable for the server to include a Vary header in the response indicating the parameters by which the content of the requested URI differs.

    The geographic location of the client can be determined by the remote IP address. This is possible due to the fact that IP addresses, like domain names, are registered on specific person or organization. When registering, you specify the region in which the desired address space will be used. This data is publicly available, and corresponding freely available databases and ready-made data can be found on the Internet. software modules to work with them (you should focus on keywords"Geo IP").

    It should be remembered that this method is capable of determining the location to a maximum of a city (from here the country is determined). In this case, the information is relevant only at the time of registration of the address space. For example, if a Moscow provider registers a range of addresses indicating Moscow and begins to provide access to clients from the nearest Moscow region, then its subscribers may observe on some sites that they are from Moscow, and not from Krasnogorsk or Dzerzhinsky.

    Server-driven negotiation has several disadvantages:

    • The server only guesses which option is most preferable for end user, but cannot know exactly what is needed in at the moment(for example, version in Russian or English).
    • There are a lot of Accept group headers sent, but few resources with multiple options. Because of this, the equipment experiences excessive load.
    • The shared cache is limited in its ability to produce the same response to identical requests from different users.
    • Passing Accept headers may also reveal some information about its preferences, such as languages ​​used, browser, encoding.

    Customer driven

    IN in this case the content type is determined only on the client side. To do this, the server returns with a status code of 300 (Multiple Choices) or 406 (Not Acceptable) a list of options from which the user selects the appropriate one. Client-driven reconciliation is good when content varies in common ways (such as language and encoding) and a public cache is used.

    The main disadvantage is extra load, since you have to make an additional request to get the desired content.

    Transparent approval

    This negotiation is completely transparent to the client and server. In this case, a shared cache is used that contains a list of options, similar to client-driven negotiation. If the cache understands all these options, then it makes the choice itself, as in server-driven negotiation. This reduces the load on the origin server and eliminates the additional request from the client.

    The core HTTP specification does not describe the transparent negotiation mechanism in detail.

    Multiple Contents

    HTTP protocol supports the transfer of multiple entities within a single message. Moreover, entities can be transmitted not only in the form of a single-level sequence, but in the form of a hierarchy with nesting of elements into each other. The media types multipart/* are used to indicate multiple content. Working with such types is carried out using general rules as described in RFC 2046 (unless otherwise specified by a specific media type). If the recipient doesn't know how to handle the type, then it treats it the same way as multipart/mixed .

    The boundary parameter means the separator between various types transmitted messages. For example, the DestAddress parameter passed from the form passes the value e-mail addresses, and the AttachedFile1 element that follows it sends the binary content of the image in .jpg format

    On the server side, messages with multiple contents can be sent in response to requests for multiple resource fragments. In this case, the media type multipart/byteranges is used.

    On the client side, when submitting an HTML form, the POST method is most often used. A typical example: email sending pages with file attachments. When sending such a letter, the browser generates a message of the multipart/form-data type, integrating into it as separate parts entered by the user, the subject of the letter, the recipient's address, the text itself and attached files:

    POST /send-message.html HTTP/1.1 Host: mail.example.com Referer: http://mail.example.com/send-message.html User-Agent: BrowserForDummies/4.67b Content-Type: multipart/form- data; boundary="Asrf456BGe4h" Content-Length: (total volume including child headers) Connection: keep-alive Keep-Alive: 300 (empty line) (missing preamble) --Asrf456BGe4h Content-Disposition: form-data; name="DestAddress" (empty line) [email protected]--Asrf456BGe4h Content-Disposition: form-data; name="MessageTitle" (empty line) I'm indignant --Asrf456BGe4h Content-Disposition: form-data; name="MessageText" (empty line) Hello Vasily! Your pet lion, which you left with me last week, tore up my entire sofa. Please pick him up soon! Attached are two photos with the consequences. --Asrf456BGe4h Content-Disposition: form-data; name="AttachedFile1"; filename="horror-photo-1.jpg" Content-Type: image/jpeg (empty line) (binary content of the first photo) --Asrf456BGe4h Content-Disposition: form-data; name="AttachedFile2"; filename="horror-photo-2.jpg" Content-Type: image/jpeg (empty line) (binary content of second photo) --Asrf456BGe4h-- (missing epilogue)

    In the example in the Content-Disposition headers, the name parameter corresponds to name attribute in HTML tags And