• What is HTTP and how does it work. In simple terms about HTTP

    Purpose of the lecture: form an understanding of the functioning of the HTTP/HTTPS protocol.

    HTTP (HyperText Transfer Protocol) is one of the most important protocols that enables data transfer over the Internet. HTTP protocol is located at the seventh application layer of the OSI model and operates based on the TCP protocol.

    Since the HTTP protocol is at the application layer, application applications can use it directly to organize network interaction. Additionally, the HTTP protocol is a critical part of web applications. In this case, the browser, using HTTP capabilities, interacts with the server to obtain the necessary data.

    The HTTP protocol involves data transfer in "request-response" mode. Moreover, within the framework of such interaction, data of almost any type can be transmitted - plain text, hypertext (HTML), style sheets, client scripts, images, documents in various formats, binary information, etc.

    Within the HTTP protocol, there is always a clear distinction between client and server. The client is always the initiator of the interaction. The server, in turn, listens for all incoming connections and processes each of them. Since HTTP communication operates on a request-response basis, an HTTP request must be generated to initiate a data transfer session. As part of this request, the client describes what resource it wants to receive from the server, and also specifies various additional options. After this, the request is sent to the server and it, in turn, processes the request and generates an HTTP response, which contains service information and the contents of the resource that was requested. In general, the process can be schematically depicted as follows.


    An HTTP request and an HTTP response are similar in structure and are called HTTP messages. In fact, all interaction within the HTTP protocol comes down to forwarding HTTP messages. Each HTTP message is plain text information presented in a specific format. Let's take a closer look at the HTTP message format.

    Each HTTP message consists of several lines. The first line is always the welcome line; it differs significantly for an HTTP request and an HTTP response. It usually contains general information about the request. After the first line in an HTTP message there are HTTP headers - each header with new line. HTTP headers are present in both the HTTP request and the HTTP response. The purpose of HTTP headers is to clarify the HTTP message so that the party receiving this HTTP message can more accurately process incoming message. The number of HTTP message headers is variable and depends on the specific HTTP message. If the sending party believes that this HTTP header is necessary in this HTTP message, then it adds it, if not, then it does not add it. Each HTTP header starts on a new line. An HTTP header consists of a name and a value, and the name of the header defines its purpose. The set of HTTP headers is followed by an empty line, followed by the body of the HTTP message. Thus, general structure HTTP messages can be represented as follows.


    HTTP request is generated on the client and sent to the server in order to receive information from it. It contains information about the resource that needs to be downloaded, as well as additional information. The first line contains the request method (which we will look at later in this lecture), the resource name (indicating relative path on the server), as well as the protocol version. For example, the type of greeting line could be defined as " GET /images/corner1.png HTTP/1.1". Such a request asks the server to return an image located in the folder " images" and called "corner1.png". HTTP headers are important for an HTTP request because they indicate clarifying information about the request - browser version, the client's ability to accept compressed content, caching capabilities, and others important parameters, which can influence the formation of the response. The body of an HTTP request usually contains information that needs to be sent to the server. For example, if you need to upload a file to the server, the contents of the file will be in the body of the HTTP request. However, placing data in the HTTP request body is not allowed for all HTTP methods. For example, the body of an HTTP request is always empty if the GET method is used. So a standard HTTP request might look like this.


    In the following HTTP request, the client contacts the server " microsoft.com", requests resource " images/corner.png" and indicates that it is capable of accepting "gzip" or "deflate" compressed content, its language is English and indicates the version of its browser. As noted earlier, the number and set of headers can vary significantly. Another example is HTTP - request.


    This request differs from the previous one in that it uses the POST method, which also uploads data to the server. In this case, the data itself is contained in the body of the HTTP request after an empty line.

    HTTP response generated by the web server in response to an incoming HTTP request. It is similar in structure to an HTTP request, but has certain differences. The main difference is in the first line. Instead of the name of the requested resource and the request method, it indicates the status of the response. The status indicates how successful the HTTP request was. For example, if a document is found on the server and can be issued to the client, then the status has the value " OK", which indicates that the request was completed successfully. However, exceptional situations may appear - for example, the document is not on the server or the user does not have rights to receive the resource. We will consider a set of various HTTP response status messages later in this lecture. Thus , the first line of the HTTP response may be "HTTP/1.1 200 OK". HTTP headers in the HTTP response are also an important element. They characterize the content that is sent to the client. For example, these HTTP headers may contain information about the type of content. (HTML document, image, etc.), content length (size in bytes), modification date, cache mode, etc. All of these headers affect the way data is displayed on the client, and also set rules for storing data in the client cache. A typical HTTP response might look like this:


    In the example above, the server indicates that the resource has been found, its type is an HTML document, and also indicates the size and modification date. After the empty line comes the contents of the HTML document, i.e. essentially what the client requested. As with an HTTP request, the number of headers in an HTTP response can vary at the discretion of the web server.

    When considering the structure of an HTTP request, the concept was touched upon HTTP request method. The HTTP request method determines how the specified HTTP request will be processed, i.e. in some sense determines its semantics. Since HTTP requests can have a wide variety of meanings, specifying the method is an important part of constructing an HTTP request. HTTP requests can have the following meanings: requesting a resource from the server, creating or modifying a resource on the server, deleting a resource on the server, etc.

    The most common HTTP request methods are following types methods:

    GET allows you to receive information from the server, the request body always remains empty;
    HEAD similar to GET, but the response body always remains empty, allows you to check the availability of the requested resource and read the HTTP headers of the response;
    POST allows you to upload information to the server, in essence it changes the resource on the server, but is often used to create a resource on the server, the body of the request contains the resource being changed/created;
    PUT similar to POST, but in essence it is involved in creating a resource rather than changing it; the body of the request contains the resource being created;
    DELETE removes a resource from the server.

    In addition to these HTTP methods, there are also large number other methods defined in the HTTP protocol specification. However, despite this, browsers often only use the GET and POST methods. However, other applications may use HTTP methods at their discretion.

    As we saw earlier, the HTTP response contains status code or return code. This status shows the status of the HTTP response that was received from the server. This mechanism is necessary for the functioning of the HTTP protocol, since various non-standard situations. All status codes are three-digit numbers. In addition, the HTTP response may contain text description condition. All status codes are divided into five groups.

    Each group of status codes identifies the situation in which the request found itself. The group is determined by the first digit of the status code. For example, status codes of the 2xx group indicate the success of the HTTP request. The most commonly used status codes are shown in the table below.

    Code Description
    1xx Information codes
    2xx Successful completion of the request
    200 The request was processed successfully
    201 Object created
    202 Information accepted
    203 Information that is not trustworthy
    204 No content
    205 Reset Contents
    206 Partial content (for example, when “downloading” files)
    3xx Redirection (some action is required to complete the request)
    300 Several options to choose from
    301 The resource is moved permanently
    302 Resource moved temporarily
    303 See another resource
    304 Content has not changed
    305 Use a proxy server
    4xx The problem is not with the server, but with the request
    400 Invalid request
    401 No permission to view resource
    402 Payment required
    403 Access denied
    404 Resource not found
    405 Invalid method
    406 Inappropriate request
    407 Registration on a proxy server is required
    408 Request processing timed out
    409 Conflict
    410 There is no more resource
    411 Length required
    412 Precondition not met
    413 The requested element is too large
    414 The resource identifier (URI) is too long
    415 Unsupported resource type
    5xx Server errors
    500 Internal Server Error
    501 Function not implemented
    502 Gateway defect
    503 Service unavailable
    504 The gateway time has expired
    505 Unsupported HTTP version

    These and other status codes are used to convey information about the status of a request from the client to the server.

    A distinctive feature of the HTTP protocol is that within this protocol information is transmitted in the form of text. This means that working with such a protocol is quite simple. In addition, security engineers leave the HTTP protocol open even with a strict security regime. Therefore, the implementation of network interaction within the HTTP protocol is one of the promising areas.

    However, despite the simplicity of the protocol, there is a problem of leakage of transmitted information. Since information is transmitted in plain text, intercepting such information is quite simple. In some situations this problem is not critical. However, for web applications that work with confidential information, this is a fairly significant drawback.

    For this reason, there is a modification of this protocol - HTTPS, i.e. HTTP protocol with encryption support.

    As you know, there are classical strong encryption algorithms that encrypt data based on existing key. The same key is used to encrypt and decrypt data - if anyone knows the key to the encrypted information, then they can decrypt it. A key is a regular sequence of bits of a certain length. The longer the key, the more difficult it is to break the encryption algorithm. Therefore, in order to protect your information, you need to keep the encryption key secret. However, how can this be achieved within the framework of interaction via the HTTP protocol? After all, if you transmit this key in clear text, then the meaning of encryption disappears. In this case, an additional type of encryption is used - asymmetric. In this case, there is a pair of keys - public and private. By using public key You can only encrypt information, and with the help of a private one you can decrypt it. Usually with this approach private key is kept secret and the public key is publicly available. However, the asymmetric algorithm is slower than the symmetric one, so it is used for the initial exchange of symmetric keys. Let's look at the entire algorithm for how an encrypted HTTP connection works.


    When a client contacts a server over a secure channel, the server stores a public and private key. At the initial moment of time, the server transmits the public key of asymmetric encryption to the client. The client randomly generates a key symmetric encryption and encrypts it using the public key received from the server. After this, the client sends the encrypted key to the server and at this point in time the client and server have the same keys for symmetric encryption. Next comes the HTTP interaction, which is encrypted using this symmetric key. The symmetric key remains secret and cannot be intercepted because the private key (which can be used to decrypt the first message containing the symmetric key) remains secret on the server. Thus, the confidentiality and integrity of transmitted data via HTTP is ensured

    Brief summary

    All web applications are based on the HTTP protocol. The HTTP protocol transmits text information and operates in request-response mode. An HTTP request and an HTTP response have a strictly defined structure - a welcome line, headers and a message body. The number of HTTP headers is variable. HTTP headers are separated from the message body by an empty line. Every HTTP request is sent to the server as part of an HTTP method. The HTTP method determines the semantics of the request (get resource, add, change, delete, etc.). In the HTTP response, in addition to service information and useful data, the request status is sent, which informs the client about the success of the request. All status codes are divided into groups. Since data transmitted via HTTP can be intercepted, it does not ensure the confidentiality of transmitted information. If such a level of security is necessary, then you need to use the HTTPS protocol, which provides encryption of transmitted information based on a combination of symmetric and asymmetric encryption algorithms.

    .) It is through the ability to specify how a message is encoded that the client and server can exchange binary data, although this protocol is text.

    Proxy servers

    History of development

    HTTP/0.9

    In addition to the usual GET method, there is also a distinction. Conditional GET requests contain If-Modified-Since, If-Match, If-Range, and similar headers. Partial GETs contain Range in the request. Execution order similar requests defined separately by standards.

    HEAD

    Similar to the GET method, except that there is no body in the server response. The HEAD request is typically used to retrieve metadata, check for the existence of a resource (URL validation), and see if it has changed since it was last accessed.

    Response headers may be cached. If a resource's metadata does not match the corresponding information in the cache, the copy of the resource is marked as out of date.

    POST

    Used to transfer user data to a specified resource. For example, on blogs, visitors can typically enter comments on posts into an HTML form, after which they are POSTed to the server and placed on the page. In this case, the transmitted data (in the example with blogs, the text of the comment) is included in the body of the request. Similarly, using the POST method, files are usually uploaded to the server.

    Unlike the GET method, the POST method is not considered idempotent, that is, repeating the same POST requests may return different results (for example, after each comment is submitted, one copy of that comment will appear).

    If the execution result is 200 (Ok), a message about the completion of the request should be included in the response body. If a resource has been created, the server SHOULD return a 201 (Created) response with the URI of the new resource in the Location header.

    The server response message to the POST method is not cached.

    PUT

    Used to load the request content to the URI specified in the request. If a resource does not exist at the given URI, the server creates it and returns status 201 (Created). If the resource has been changed, the server returns 200 (Ok) or 204 (No Content). The server MUST NOT ignore invalid Content-* headers sent by the client along with the message. If any of these headers cannot be recognized or are not valid under current conditions, then an error code of 501 (Not Implemented) must be returned.

    The fundamental difference between the POST and PUT methods is the understanding of the purpose of the resource URI. The POST method assumes that the specified URI will process the content sent by the client. By using PUT, the client assumes that the content being downloaded matches the resource located at the given URI.

    Server response messages to the PUT method are not cached.

    PATCH

    Similar to PUT, but applies only to a fragment of the resource.

    DELETE

    Deletes the specified resource.

    TRACE

    Returns the received request so that the client can see what information intermediate servers add or change in the request.

    LINK

    Establishes a connection between the specified resource and others.

    UNLINK

    Removes the connection of the specified resource with others.

    CONNECT

    Converts a request connection into a transparent TCP/IP tunnel, typically to facilitate the establishment of a secure SSL connections via an unencrypted proxy.

    Status codes

    The status code is part of the first line of the server response. It represents an integer of three Arabic numerals. The first digit indicates the class of the condition. The response code is usually followed by an explanatory phrase in English separated by a space, which explains to the person the reason for this particular response. Examples:

    201 Webpage Created 403 Access allowed only for registered users 507 Insufficient Storage

    The client learns from the response code about the results of its request and determines what actions to take next. The set of status codes is a standard and they are described in the corresponding RFCs. The introduction of new codes should be made only after agreement with the IETF. The client may not know all status codes, but it must respond according to the class of the code.

    There are currently five classes of status codes.

    1xx Informational (Russian) Informational)

    This class contains codes that inform about the transfer process. In HTTP/1.0, messages with such codes should be ignored. In HTTP/1.1, the client must be prepared to accept this class of messages as a normal response, but does not need to send anything to the server. The messages themselves from the server contain only the start line of the response and, if required, a few response-specific header fields. Proxy servers must send such messages further from the server to the client.

    2xx Success (Russian) Success)

    Messages of this class inform about cases of successful acceptance and processing of a client’s request. Depending on the status, the server may also transmit the headers and body of the message.

    3xx Redirection (Russian) Redirection )

    Class 3xx codes tell the client that in order to successfully complete the operation, it is necessary to make another request (usually to a different URI). Of this class, five codes , , , and relate directly to redirections (redirect). The server specifies the address to which the client should make the request in the Location header. However, it is possible to use fragments in the target URI.

    4xx Client Error (Russian) Client error)

    The 4xx code class is intended to indicate errors on the client side. When using all methods except HEAD, the server must return a hypertext explanation to the user in the body of the message.

    To remember the values ​​of codes from 400 to 417, there are illustrative mnemonics techniques

    5xx Server Error (Russian) Server error)

    Codes 5xx are allocated for cases of unsuccessful operation due to the fault of the server. For all situations other than using the HEAD method, the server must include in the body of the message an explanation that the client will display to the user.

    Headings

    Message body

    The HTTP message body (message-body), if present, is used to convey the body of the object associated with the request or response. The message-body differs from the entity-body only when transfer encoding is applied, as indicated by the Transfer-Encoding header field.

    Message-body = entity-body |

    The Transfer-Encoding field must be used to indicate any transfer encoding applied by the application to ensure that the message is transmitted securely and correctly. The Transfer-Encoding field is a property of the message, not the object, and thus can be added or removed by any application in the request/response chain.

    The rules governing whether a message body is acceptable in a message are different for requests and responses.

    The presence of a message body in a request is indicated by adding a Content-Length or Transfer-Encoding header field to the request headers. A message body (message-body) MAY be added to a request only when the request method allows an entity-body.

    Whether or not a message-body is included in the response message depends on both the request method and the response status code. All responses to a request with the HEAD method must not include a message-body, even if entity-header fields are present to make one believe that the entity is present. No responses with status codes 1xx (Informational), 204 (No Content), and 304 (Not Modified) must contain a message-body. All other responses contain a message body, even if it has zero length.

    HTTP Dialog Examples

    Regular GET request

    There are two main types of approvals:

    • Server-managed Server-Driven).
    • Client managed Agent-Driven).

    Both types or each of them separately can be used simultaneously.

    The main protocol specification (RFC 2616) also highlights the so-called transparent negotiation. Transparent Negotiation) as the preferred option for combining both types. The latter mechanism should not be confused with the independent technology Transparent Content Negotiation (TCN, Russian. Transparent content approval , see RFC 2295), which is not part of the HTTP protocol, but can be used with it. Both have significant differences in the principle of operation and the very meaning of the word “transparent”. In the HTTP specification, transparency means that the process is invisible to the client and server, and in TCN technology, transparency means accessibility full list resource options for all participants in the data delivery process.

    Server Managed

    If there are multiple versions of a resource, the server can analyze the client's request headers to produce what it believes is the most appropriate version. The main headers analyzed are Accept, Accept-Charset, Accept-Encoding, Accept-Languages ​​and User-Agent. It is advisable for the server to include a Vary header in the response indicating the parameters by which the content of the requested URI differs.

    The geographic location of the client can be determined by the remote IP address. This is possible due to the fact that IP addresses, like domain names, are registered on specific person or organization. When registering, you specify the region in which the desired address space will be used. This data is publicly available, and corresponding freely available databases and ready-made data can be found on the Internet. software modules to work with them (you should focus on keywords"Geo IP").

    It should be remembered that this method is capable of determining the location to a maximum of a city (from here the country is determined). In this case, the information is relevant only at the time of registration of the address space. For example, if a Moscow provider registers a range of addresses indicating Moscow and begins to provide access to clients from the nearest Moscow region, then its subscribers may observe on some sites that they are from Moscow, and not from Krasnogorsk or Dzerzhinsky.

    Server-driven negotiation has several disadvantages:

    • The server only guesses which option is most preferable for end user, but cannot know exactly what is needed in at the moment(for example, version in Russian or English).
    • There are a lot of Accept group headers sent, but few resources with multiple options. Because of this, the equipment experiences excessive load.
    • The shared cache is limited in its ability to produce the same response to identical requests from different users.
    • Passing Accept headers may also reveal some information about its preferences, such as languages ​​used, browser, encoding.

    Customer driven

    IN in this case the content type is determined only on the client side. To do this, the server returns with status code 300 (Multiple Choices) or 406 (Not Acceptable) a list of options from which the user selects the appropriate one. Client-driven reconciliation is good when content varies in common ways (such as language and encoding) and a public cache is used.

    The main disadvantage is extra load, since you have to make an additional request to get the desired content.

    Transparent approval

    This negotiation is completely transparent to the client and server. In this case, a shared cache is used that contains a list of options, similar to client-driven negotiation. If the cache understands all these options, then it makes the choice itself, as in server-driven negotiation. This reduces the load on the origin server and eliminates the additional request from the client.

    The core HTTP specification does not describe the transparent negotiation mechanism in detail.

    Multiple Contents

    The HTTP protocol supports the transfer of multiple entities within a single message. Moreover, entities can be transmitted not only in the form of a single-level sequence, but in the form of a hierarchy with nesting of elements into each other. The media types multipart/* are used to indicate multiple content. Working with such types is carried out using general rules as described in RFC 2046 (unless otherwise specified by a specific media type). If the recipient doesn't know how to handle the type, then it treats it the same way as multipart/mixed .

    The boundary parameter means the separator between various types transmitted messages. For example, the DestAddress parameter passed from the form passes the value e-mail addresses, and the AttachedFile1 element that follows it sends the binary content of the image in .jpg format

    On the server side, messages with multiple contents can be sent in response to requests for multiple resource fragments. In this case, the media type multipart/byteranges is used.

    On the client side, when submitting an HTML form, the POST method is most often used. A typical example: email sending pages with file attachments. When sending such a letter, the browser generates a message of the multipart/form-data type, integrating into it as separate parts entered by the user, the subject of the letter, the recipient's address, the text itself and attached files:

    POST /send-message.html HTTP/1.1 Host: mail.example.com Referer: http://mail.example.com/send-message.html User-Agent: BrowserForDummies/4.67b Content-Type: multipart/form- data; boundary="Asrf456BGe4h" Content-Length: (total volume including child headers) Connection: keep-alive Keep-Alive: 300 (empty line) (missing preamble) --Asrf456BGe4h Content-Disposition: form-data; name="DestAddress" (empty line) [email protected]--Asrf456BGe4h Content-Disposition: form-data; name="MessageTitle" (empty line) I'm indignant --Asrf456BGe4h Content-Disposition: form-data; name="MessageText" (empty line) Hello Vasily! Your pet lion, which you left with me last week, tore up my entire sofa. Please pick him up soon! Attached are two photos with the consequences. --Asrf456BGe4h Content-Disposition: form-data; name="AttachedFile1"; filename="horror-photo-1.jpg" Content-Type: image/jpeg (empty line) (binary content of the first photo) --Asrf456BGe4h Content-Disposition: form-data; name="AttachedFile2"; filename="horror-photo-2.jpg" Content-Type: image/jpeg (empty line) (binary content of second photo) --Asrf456BGe4h-- (missing epilogue)

    In the example in the Content-Disposition headers, the name parameter corresponds to name attribute in HTML tags And