What is HTTP and how does it work. In simple terms about HTTP
Purpose of the lecture: form an understanding of the functioning of the HTTP/HTTPS protocol.
HTTP (HyperText Transfer Protocol) is one of the most important protocols that enables data transfer over the Internet. HTTP protocol is located at the seventh application layer of the OSI model and operates based on the TCP protocol.
Since the HTTP protocol is at the application layer, application applications can use it directly to organize network interaction. Additionally, the HTTP protocol is a critical part of web applications. In this case, the browser, using HTTP capabilities, interacts with the server to obtain the necessary data.
The HTTP protocol involves data transfer in "request-response" mode. Moreover, within the framework of such interaction, data of almost any type can be transmitted - plain text, hypertext (HTML), style sheets, client scripts, images, documents in various formats, binary information, etc.
Within the HTTP protocol, there is always a clear distinction between client and server. The client is always the initiator of the interaction. The server, in turn, listens for all incoming connections and processes each of them. Since HTTP communication operates on a request-response basis, an HTTP request must be generated to initiate a data transfer session. As part of this request, the client describes what resource it wants to receive from the server, and also specifies various additional options. After this, the request is sent to the server and it, in turn, processes the request and generates an HTTP response, which contains service information and the contents of the resource that was requested. In general, the process can be schematically depicted as follows.
An HTTP request and an HTTP response are similar in structure and are called HTTP messages. In fact, all interaction within the HTTP protocol comes down to forwarding HTTP messages. Each HTTP message is plain text information presented in a specific format. Let's take a closer look at the HTTP message format.
Each HTTP message consists of several lines. The first line is always the welcome line; it differs significantly for an HTTP request and an HTTP response. It usually contains general information about the request. After the first line in an HTTP message there are HTTP headers - each header with new line. HTTP headers are present in both the HTTP request and the HTTP response. The purpose of HTTP headers is to clarify the HTTP message so that the party receiving this HTTP message can more accurately process incoming message. The number of HTTP message headers is variable and depends on the specific HTTP message. If the sending party believes that this HTTP header is necessary in this HTTP message, then it adds it, if not, then it does not add it. Each HTTP header starts on a new line. An HTTP header consists of a name and a value, and the name of the header defines its purpose. The set of HTTP headers is followed by an empty line, followed by the body of the HTTP message. Thus, general structure HTTP messages can be represented as follows.
HTTP request is generated on the client and sent to the server in order to receive information from it. It contains information about the resource that needs to be downloaded, as well as additional information. The first line contains the request method (which we will look at later in this lecture), the resource name (indicating relative path on the server), as well as the protocol version. For example, the type of greeting line could be defined as " GET /images/corner1.png HTTP/1.1". Such a request asks the server to return an image located in the folder " images" and called "corner1.png". HTTP headers are important for an HTTP request because they indicate clarifying information about the request - browser version, the client's ability to accept compressed content, caching capabilities, and others important parameters, which can influence the formation of the response. The body of an HTTP request usually contains information that needs to be sent to the server. For example, if you need to upload a file to the server, the contents of the file will be in the body of the HTTP request. However, placing data in the HTTP request body is not allowed for all HTTP methods. For example, the body of an HTTP request is always empty if the GET method is used. So a standard HTTP request might look like this.
In the following HTTP request, the client contacts the server " microsoft.com", requests resource " images/corner.png" and indicates that it is capable of accepting "gzip" or "deflate" compressed content, its language is English and indicates the version of its browser. As noted earlier, the number and set of headers can vary significantly. Another example is HTTP - request.
This request differs from the previous one in that it uses the POST method, which also uploads data to the server. In this case, the data itself is contained in the body of the HTTP request after an empty line.
HTTP response generated by the web server in response to an incoming HTTP request. It is similar in structure to an HTTP request, but has certain differences. The main difference is in the first line. Instead of the name of the requested resource and the request method, it indicates the status of the response. The status indicates how successful the HTTP request was. For example, if a document is found on the server and can be issued to the client, then the status has the value " OK", which indicates that the request was completed successfully. However, exceptional situations may appear - for example, the document is not on the server or the user does not have rights to receive the resource. We will consider a set of various HTTP response status messages later in this lecture. Thus , the first line of the HTTP response may be "HTTP/1.1 200 OK". HTTP headers in the HTTP response are also an important element. They characterize the content that is sent to the client. For example, these HTTP headers may contain information about the type of content. (HTML document, image, etc.), content length (size in bytes), modification date, cache mode, etc. All of these headers affect the way data is displayed on the client, and also set rules for storing data in the client cache. A typical HTTP response might look like this:
In the example above, the server indicates that the resource has been found, its type is an HTML document, and also indicates the size and modification date. After the empty line comes the contents of the HTML document, i.e. essentially what the client requested. As with an HTTP request, the number of headers in an HTTP response can vary at the discretion of the web server.
When considering the structure of an HTTP request, the concept was touched upon HTTP request method. The HTTP request method determines how the specified HTTP request will be processed, i.e. in some sense determines its semantics. Since HTTP requests can have a wide variety of meanings, specifying the method is an important part of constructing an HTTP request. HTTP requests can have the following meanings: requesting a resource from the server, creating or modifying a resource on the server, deleting a resource on the server, etc.
The most common HTTP request methods are following types methods:
GET | allows you to receive information from the server, the request body always remains empty; |
HEAD | similar to GET, but the response body always remains empty, allows you to check the availability of the requested resource and read the HTTP headers of the response; |
POST | allows you to upload information to the server, in essence it changes the resource on the server, but is often used to create a resource on the server, the body of the request contains the resource being changed/created; |
PUT | similar to POST, but in essence it is involved in creating a resource rather than changing it; the body of the request contains the resource being created; |
DELETE | removes a resource from the server. |
In addition to these HTTP methods, there are also large number other methods defined in the HTTP protocol specification. However, despite this, browsers often only use the GET and POST methods. However, other applications may use HTTP methods at their discretion.
As we saw earlier, the HTTP response contains status code or return code. This status shows the status of the HTTP response that was received from the server. This mechanism is necessary for the functioning of the HTTP protocol, since various non-standard situations. All status codes are three-digit numbers. In addition, the HTTP response may contain text description condition. All status codes are divided into five groups.
Each group of status codes identifies the situation in which the request found itself. The group is determined by the first digit of the status code. For example, status codes of the 2xx group indicate the success of the HTTP request. The most commonly used status codes are shown in the table below.
Code | Description | |
---|---|---|
1xx | Information codes | |
2xx | Successful completion of the request | |
200 | The request was processed successfully | |
201 | Object created | |
202 | Information accepted | |
203 | Information that is not trustworthy | |
204 | No content | |
205 | Reset Contents | |
206 | Partial content (for example, when “downloading” files) | |
3xx | Redirection (some action is required to complete the request) | |
300 | Several options to choose from | |
301 | The resource is moved permanently | |
302 | Resource moved temporarily | |
303 | See another resource | |
304 | Content has not changed | |
305 | Use a proxy server | |
4xx | The problem is not with the server, but with the request | |
400 | Invalid request | |
401 | No permission to view resource | |
402 | Payment required | |
403 | Access denied | |
404 | Resource not found | |
405 | Invalid method | |
406 | Inappropriate request | |
407 | Registration on a proxy server is required | |
408 | Request processing timed out | |
409 | Conflict | |
410 | There is no more resource | |
411 | Length required | |
412 | Precondition not met | |
413 | The requested element is too large | |
414 | The resource identifier (URI) is too long | |
415 | Unsupported resource type | |
5xx | Server errors | |
500 | Internal Server Error | |
501 | Function not implemented | |
502 | Gateway defect | |
503 | Service unavailable | |
504 | The gateway time has expired | |
505 | Unsupported HTTP version |
These and other status codes are used to convey information about the status of a request from the client to the server.
A distinctive feature of the HTTP protocol is that within this protocol information is transmitted in the form of text. This means that working with such a protocol is quite simple. In addition, security engineers leave the HTTP protocol open even with a strict security regime. Therefore, the implementation of network interaction within the HTTP protocol is one of the promising areas.
However, despite the simplicity of the protocol, there is a problem of leakage of transmitted information. Since information is transmitted in plain text, intercepting such information is quite simple. In some situations this problem is not critical. However, for web applications that work with confidential information, this is a fairly significant drawback.
For this reason, there is a modification of this protocol - HTTPS, i.e. HTTP protocol with encryption support.
As you know, there are classical strong encryption algorithms that encrypt data based on existing key. The same key is used to encrypt and decrypt data - if anyone knows the key to the encrypted information, then they can decrypt it. A key is a regular sequence of bits of a certain length. The longer the key, the more difficult it is to break the encryption algorithm. Therefore, in order to protect your information, you need to keep the encryption key secret. However, how can this be achieved within the framework of interaction via the HTTP protocol? After all, if you transmit this key in clear text, then the meaning of encryption disappears. In this case, an additional type of encryption is used - asymmetric. In this case, there is a pair of keys - public and private. By using public key You can only encrypt information, and with the help of a private one you can decrypt it. Usually with this approach private key is kept secret and the public key is publicly available. However, the asymmetric algorithm is slower than the symmetric one, so it is used for the initial exchange of symmetric keys. Let's look at the entire algorithm for how an encrypted HTTP connection works.
When a client contacts a server over a secure channel, the server stores a public and private key. At the initial moment of time, the server transmits the public key of asymmetric encryption to the client. The client randomly generates a key symmetric encryption and encrypts it using the public key received from the server. After this, the client sends the encrypted key to the server and at this point in time the client and server have the same keys for symmetric encryption. Next comes the HTTP interaction, which is encrypted using this symmetric key. The symmetric key remains secret and cannot be intercepted because the private key (which can be used to decrypt the first message containing the symmetric key) remains secret on the server. Thus, the confidentiality and integrity of transmitted data via HTTP is ensured
Brief summary
All web applications are based on the HTTP protocol. The HTTP protocol transmits text information and operates in request-response mode. An HTTP request and an HTTP response have a strictly defined structure - a welcome line, headers and a message body. The number of HTTP headers is variable. HTTP headers are separated from the message body by an empty line. Every HTTP request is sent to the server as part of an HTTP method. The HTTP method determines the semantics of the request (get resource, add, change, delete, etc.). In the HTTP response, in addition to service information and useful data, the request status is sent, which informs the client about the success of the request. All status codes are divided into groups. Since data transmitted via HTTP can be intercepted, it does not ensure the confidentiality of transmitted information. If such a level of security is necessary, then you need to use the HTTPS protocol, which provides encryption of transmitted information based on a combination of symmetric and asymmetric encryption algorithms.
.) It is through the ability to specify how a message is encoded that the client and server can exchange binary data, although this protocol is text.
Proxy servers
History of development
HTTP/0.9
In addition to the usual GET method, there is also a distinction. Conditional GET requests contain If-Modified-Since, If-Match, If-Range, and similar headers. Partial GETs contain Range in the request. Execution order similar requests defined separately by standards.
HEAD
Similar to the GET method, except that there is no body in the server response. The HEAD request is typically used to retrieve metadata, check for the existence of a resource (URL validation), and see if it has changed since it was last accessed.
Response headers may be cached. If a resource's metadata does not match the corresponding information in the cache, the copy of the resource is marked as out of date.
POST
Used to transfer user data to a specified resource. For example, on blogs, visitors can typically enter comments on posts into an HTML form, after which they are POSTed to the server and placed on the page. In this case, the transmitted data (in the example with blogs, the text of the comment) is included in the body of the request. Similarly, using the POST method, files are usually uploaded to the server.
Unlike the GET method, the POST method is not considered idempotent, that is, repeating the same POST requests may return different results (for example, after each comment is submitted, one copy of that comment will appear).
If the execution result is 200 (Ok), a message about the completion of the request should be included in the response body. If a resource has been created, the server SHOULD return a 201 (Created) response with the URI of the new resource in the Location header.
The server response message to the POST method is not cached.
PUT
Used to load the request content to the URI specified in the request. If a resource does not exist at the given URI, the server creates it and returns status 201 (Created). If the resource has been changed, the server returns 200 (Ok) or 204 (No Content). The server MUST NOT ignore invalid Content-* headers sent by the client along with the message. If any of these headers cannot be recognized or are not valid under current conditions, then an error code of 501 (Not Implemented) must be returned.
The fundamental difference between the POST and PUT methods is the understanding of the purpose of the resource URI. The POST method assumes that the specified URI will process the content sent by the client. By using PUT, the client assumes that the content being downloaded matches the resource located at the given URI.
Server response messages to the PUT method are not cached.
PATCH
Similar to PUT, but applies only to a fragment of the resource.
DELETE
Deletes the specified resource.
TRACE
Returns the received request so that the client can see what information intermediate servers add or change in the request.
LINK
Establishes a connection between the specified resource and others.
UNLINK
Removes the connection of the specified resource with others.
CONNECT
Converts a request connection into a transparent TCP/IP tunnel, typically to facilitate the establishment of a secure SSL connections via an unencrypted proxy.
Status codes
The status code is part of the first line of the server response. It represents an integer of three Arabic numerals. The first digit indicates the class of the condition. The response code is usually followed by an explanatory phrase in English separated by a space, which explains to the person the reason for this particular response. Examples:
201 Webpage Created 403 Access allowed only for registered users 507 Insufficient Storage
The client learns from the response code about the results of its request and determines what actions to take next. The set of status codes is a standard and they are described in the corresponding RFCs. The introduction of new codes should be made only after agreement with the IETF. The client may not know all status codes, but it must respond according to the class of the code.
There are currently five classes of status codes.
1xx Informational (Russian) Informational)This class contains codes that inform about the transfer process. In HTTP/1.0, messages with such codes should be ignored. In HTTP/1.1, the client must be prepared to accept this class of messages as a normal response, but does not need to send anything to the server. The messages themselves from the server contain only the start line of the response and, if required, a few response-specific header fields. Proxy servers must send such messages further from the server to the client.
2xx Success (Russian) Success)Messages of this class inform about cases of successful acceptance and processing of a client’s request. Depending on the status, the server may also transmit the headers and body of the message.
3xx Redirection (Russian) Redirection )Class 3xx codes tell the client that in order to successfully complete the operation, it is necessary to make another request (usually to a different URI). Of this class, five codes , , , and relate directly to redirections (redirect). The server specifies the address to which the client should make the request in the Location header. However, it is possible to use fragments in the target URI.
4xx Client Error (Russian) Client error)The 4xx code class is intended to indicate errors on the client side. When using all methods except HEAD, the server must return a hypertext explanation to the user in the body of the message.
To remember the values of codes from 400 to 417, there are illustrative mnemonics techniques
5xx Server Error (Russian) Server error)Codes 5xx are allocated for cases of unsuccessful operation due to the fault of the server. For all situations other than using the HEAD method, the server must include in the body of the message an explanation that the client will display to the user.
Headings
Message body
The HTTP message body (message-body), if present, is used to convey the body of the object associated with the request or response. The message-body differs from the entity-body only when transfer encoding is applied, as indicated by the Transfer-Encoding header field.
Message-body = entity-body |
The Transfer-Encoding field must be used to indicate any transfer encoding applied by the application to ensure that the message is transmitted securely and correctly. The Transfer-Encoding field is a property of the message, not the object, and thus can be added or removed by any application in the request/response chain.
The rules governing whether a message body is acceptable in a message are different for requests and responses.
The presence of a message body in a request is indicated by adding a Content-Length or Transfer-Encoding header field to the request headers. A message body (message-body) MAY be added to a request only when the request method allows an entity-body.
Whether or not a message-body is included in the response message depends on both the request method and the response status code. All responses to a request with the HEAD method must not include a message-body, even if entity-header fields are present to make one believe that the entity is present. No responses with status codes 1xx (Informational), 204 (No Content), and 304 (Not Modified) must contain a message-body. All other responses contain a message body, even if it has zero length.
HTTP Dialog Examples
Regular GET request
There are two main types of approvals:
- Server-managed Server-Driven).
- Client managed Agent-Driven).
Both types or each of them separately can be used simultaneously.
The main protocol specification (RFC 2616) also highlights the so-called transparent negotiation. Transparent Negotiation) as the preferred option for combining both types. The latter mechanism should not be confused with the independent technology Transparent Content Negotiation (TCN, Russian. Transparent content approval , see RFC 2295), which is not part of the HTTP protocol, but can be used with it. Both have significant differences in the principle of operation and the very meaning of the word “transparent”. In the HTTP specification, transparency means that the process is invisible to the client and server, and in TCN technology, transparency means accessibility full list resource options for all participants in the data delivery process.
Server Managed
If there are multiple versions of a resource, the server can analyze the client's request headers to produce what it believes is the most appropriate version. The main headers analyzed are Accept, Accept-Charset, Accept-Encoding, Accept-Languages and User-Agent. It is advisable for the server to include a Vary header in the response indicating the parameters by which the content of the requested URI differs.
The geographic location of the client can be determined by the remote IP address. This is possible due to the fact that IP addresses, like domain names, are registered on specific person or organization. When registering, you specify the region in which the desired address space will be used. This data is publicly available, and corresponding freely available databases and ready-made data can be found on the Internet. software modules to work with them (you should focus on keywords"Geo IP").
It should be remembered that this method is capable of determining the location to a maximum of a city (from here the country is determined). In this case, the information is relevant only at the time of registration of the address space. For example, if a Moscow provider registers a range of addresses indicating Moscow and begins to provide access to clients from the nearest Moscow region, then its subscribers may observe on some sites that they are from Moscow, and not from Krasnogorsk or Dzerzhinsky.
Server-driven negotiation has several disadvantages:
- The server only guesses which option is most preferable for end user, but cannot know exactly what is needed in at the moment(for example, version in Russian or English).
- There are a lot of Accept group headers sent, but few resources with multiple options. Because of this, the equipment experiences excessive load.
- The shared cache is limited in its ability to produce the same response to identical requests from different users.
- Passing Accept headers may also reveal some information about its preferences, such as languages used, browser, encoding.
Customer driven
IN in this case the content type is determined only on the client side. To do this, the server returns with status code 300 (Multiple Choices) or 406 (Not Acceptable) a list of options from which the user selects the appropriate one. Client-driven reconciliation is good when content varies in common ways (such as language and encoding) and a public cache is used.
The main disadvantage is extra load, since you have to make an additional request to get the desired content.
Transparent approval
This negotiation is completely transparent to the client and server. In this case, a shared cache is used that contains a list of options, similar to client-driven negotiation. If the cache understands all these options, then it makes the choice itself, as in server-driven negotiation. This reduces the load on the origin server and eliminates the additional request from the client.
The core HTTP specification does not describe the transparent negotiation mechanism in detail.
Multiple Contents
The HTTP protocol supports the transfer of multiple entities within a single message. Moreover, entities can be transmitted not only in the form of a single-level sequence, but in the form of a hierarchy with nesting of elements into each other. The media types multipart/* are used to indicate multiple content. Working with such types is carried out using general rules as described in RFC 2046 (unless otherwise specified by a specific media type). If the recipient doesn't know how to handle the type, then it treats it the same way as multipart/mixed .
The boundary parameter means the separator between various types transmitted messages. For example, the DestAddress parameter passed from the form passes the value e-mail addresses, and the AttachedFile1 element that follows it sends the binary content of the image in .jpg format
On the server side, messages with multiple contents can be sent in response to requests for multiple resource fragments. In this case, the media type multipart/byteranges is used.
On the client side, when submitting an HTML form, the POST method is most often used. A typical example: email sending pages with file attachments. When sending such a letter, the browser generates a message of the multipart/form-data type, integrating into it as separate parts entered by the user, the subject of the letter, the recipient's address, the text itself and attached files:
POST /send-message.html HTTP/1.1 Host: mail.example.com Referer: http://mail.example.com/send-message.html User-Agent: BrowserForDummies/4.67b Content-Type: multipart/form- data; boundary="Asrf456BGe4h" Content-Length: (total volume including child headers) Connection: keep-alive Keep-Alive: 300 (empty line) (missing preamble) --Asrf456BGe4h Content-Disposition: form-data; name="DestAddress" (empty line) [email protected]--Asrf456BGe4h Content-Disposition: form-data; name="MessageTitle" (empty line) I'm indignant --Asrf456BGe4h Content-Disposition: form-data; name="MessageText" (empty line) Hello Vasily! Your pet lion, which you left with me last week, tore up my entire sofa. Please pick him up soon! Attached are two photos with the consequences. --Asrf456BGe4h Content-Disposition: form-data; name="AttachedFile1"; filename="horror-photo-1.jpg" Content-Type: image/jpeg (empty line) (binary content of the first photo) --Asrf456BGe4h Content-Disposition: form-data; name="AttachedFile2"; filename="horror-photo-2.jpg" Content-Type: image/jpeg (empty line) (binary content of second photo) --Asrf456BGe4h-- (missing epilogue)
In the example in the Content-Disposition headers, the name parameter corresponds to name attribute in HTML tags And
Protocol Features
Most protocols provide for the establishment of a TCP session, during which authorization occurs once, and further actions are executed in the context of this authorization. HTTP, on the other hand, establishes a separate TCP session for each request; Later versions of HTTP allowed multiple requests to be made during a single TCP session, but browsers typically request only the page and its included objects (images, cascading styles, etc.) and then immediately terminate the TCP session. To support authorized (non-anonymous) access, HTTP uses cookies; Moreover, this authorization method allows you to save the session even after rebooting the client and server.
When accessing data via FTP or file protocols, the file type (more precisely, the type of data it contains) is determined by the file name extension, which is not always convenient. HTTP, before transmitting the data itself, transmits the “Content-Type: type/subtype” header, which allows the client to uniquely determine how to process the sent data. This is especially important when working with CGI scripts, when the file name extension indicates not the type of data sent to the client, but the need to run of this file on the server and sending the client the results of the program written in this file (in this case, the same file, depending on the request arguments and its own considerations, can generate responses different types- in the simplest case, pictures in different formats).
In addition, HTTP allows the client to send parameters to the server that will be passed to the CGI script being launched. For this purpose, forms were introduced into HTML.
The listed features of HTTP made it possible to create search engines (the first of which was AltaVista, created by DEC), forums and Internet stores. This commercialized the Internet, companies emerged whose main field of activity was providing Internet access (providers) and creating websites.
Notes
See also
Links
We present to your attention a description of the main aspects of the HTTP protocol - network protocol, from the early 90s to this day, allowing your browser to load web pages. This article was written for those who are just starting to work with computer networks and develop network applications, and who still find it difficult to read the official specifications on their own.
HTTP- a widely used data transfer protocol, originally intended for the transfer of hypertext documents (that is, documents that may contain links that allow navigation to other documents).
The abbreviation HTTP stands for HyperText Transfer Protocol, "hypertext transfer protocol". According to the OSI specification, HTTP is an application (upper, 7th) layer protocol. The current version of the protocol, HTTP 1.1, is described in the RFC 2616 specification.
The HTTP protocol involves the use of a client-server data transfer structure. The client application generates a request and sends it to the server, after which the server software processes this request, generates a response and sends it back to the client. After that client application may continue to send other requests, which will be processed in the same way.
A task that is traditionally solved using the HTTP protocol is data exchange between user application that accesses web resources (usually a web browser) and a web server. At the moment, it is thanks to the HTTP protocol that the World Wide Web operates.
HTTP is also often used as an information transfer protocol for other protocols. application level such as SOAP, XML-RPC and WebDAV. In this case, the HTTP protocol is said to be used as a “transport”.
APIs of many software products also implies the use of HTTP for data transfer - the data itself can be in any format, for example, XML or JSON.
Typically, HTTP data transfer is carried out over TCP/IP connections. In this case, server software usually uses TCP port 80 (and, if the port is not specified explicitly, then client software usually uses port 80 by default for opening HTTP connections), although it can use any other one.
How to send an HTTP request?
The easiest way to understand the HTTP protocol is to try to access some web resource manually. Imagine that you are a browser and you have a user who really wants to read articles by Anatoly Alizar.Let's assume that he entered address bar following:
Http://alizar.habrahabr.ru/
Accordingly, you, as a web browser, now need to connect to the web server at alizar.habrahabr.ru.
To do this, you can use any suitable utility command line. For example, telnet:
Telnet alizar.habrahabr.ru 80
Let me clarify right away that if you suddenly change your mind, press Ctrl + “]” and then enter - this will allow you to close the HTTP connection. In addition to telnet, you can try nc (or ncat) - depending on your taste.
After you connect to the server, you need to send an HTTP request. This, by the way, is very easy - HTTP requests can consist of just two lines.
In order to generate an HTTP request, you need to compose a starting line, and also set at least one header - this is the Host header, which is mandatory and must be present in every request. The fact is that the conversion of a domain name to an IP address is carried out on the client side, and, accordingly, when you open a TCP connection, the remote server does not have any information about which address was used for the connection: it could be, for example , address alizar.habrahabr.ru, habrahabr.ru or m.habrahabr.ru - and in all these cases the answer may differ. However, in fact network connection in all cases it opens with node 212.24.43.44, and even if initially when opening the connection it was not this IP address that was specified, but some domain name, then the server is not informed about this in any way - and that is why this address must be passed in the Host header.
The starting (initial) request line for HTTP 1.1 is composed according to the following scheme:
For example (such start line may indicate what is being requested home page site):
And, of course, don’t forget that any technology becomes much simpler and clearer when you actually start using it.
Good luck and fruitful learning!
Tags: Add tags
Allows you to receive various resources, such as HTML documents. The HTTP protocol underlies data exchange on the Internet. HTTP is a client-server communication protocol, which means requests to the server are initiated by the recipient itself, usually a web browser. The resulting final document will be reconstructed from various sub-documents, for example, from separately obtained text, a description of the document structure, images, video files, scripts and much more.
Clients and servers interact by exchanging individual messages(not a data stream). Messages sent by a client, usually a web browser, are called requests, and messages sent by the server are called answers.
Although HTTP was developed in the early 1990s, it has been continually improved due to its extensibility. HTTP is an application layer protocol that most often uses the capabilities of another protocol - TCP (or TLS - secure TCP) - to forward its messages, but any other reliable transport protocol can theoretically be used to deliver such messages. Due to its extensibility, it is used not only for the client to receive hypertext documents or images and videos, but also for transmitting content to servers, for example, using HTML forms. HTTP can also be used to retrieve only parts of a document for the purpose of updating a web page on demand.
Components of HTTP-based systems
HTTP is a client-server protocol, that is, requests are sent by one party - the user-agent (or a proxy instead). Most often, a web browser acts as a user agent, but it can be anyone, for example, a robot traveling the Web to replenish and update web page indexing data for search engines.
Each individual request request) is sent to the server, which processes it and returns a response (eng. response). Between these requests and responses there are numerous intermediaries called proxies, which perform various operations and act as gateways or caches, for example.
In reality, between the browser and the server there are many more different intermediary devices that play some role in processing the request: routers, modems, and so on. Due to the fact that the Network is built on the basis of a system of interaction levels (layers), these intermediaries are “hidden” at the network and transport levels. In this level system, HTTP occupies the most top level which is called the "application" (or "application layer"). Knowledge of network layers such as presentation, session, transport, network, link and physical, essential for network understanding and diagnostics possible problems, are not required to describe and understand HTTP.
Client: user agent
A user agent is any tool or device that acts on behalf of a user. This role primarily belongs to the web browser; In some cases, user agents are programs that are used by engineers and web developers to debug their applications.
Browser Always is the entity that initiates the request. The server never does this (although over the many years of the network's existence, mechanisms have been created that can simulate requests from the server).
To display a web page, the browser sends an initial request to obtain the HTML document of that page. After this, the browser parses this document and requests additional files, necessary for displaying the content of a web page (executable scripts, information about the page layout - CSS style sheets, additional resources in the form of images and video files). Next, the browser connects all these resources to display them to the user in the form of a single document - a web page. Scripts executed by the browser itself can receive additional resources over the network at later stages of processing of the web page, and the browser updates the user's view of that page accordingly.
A web page is a hypertext document. This means that some parts of the displayed text are links that can be activated (usually by clicking a mouse button) to retrieve and therefore display a new web page. This allows the user to direct their user agent when navigating the Web. The browser translates these “traffic directions” into HTTP requests and then interprets the HTTP responses in a user-friendly way.
Web server
On the other side communication channel the server that serves (English) is located serve) user, providing him with documents upon request. From the end user's point of view, the server is always one virtual machine, completely or partially generating a document, although in fact it can be a group of servers between which the load is balanced, that is, requests from different users are redistributed, or a complex software polling other computers (such as caching servers, database servers, application servers e-commerce and others).
A server is not necessarily located on one machine, and vice versa - several servers can be located (hosted) on the same machine. According to HTTP/1.1 version and having a Host header, they can even share the same IP address.
Proxy
Between the web browser and the server there are a large number of network nodes transmitting HTTP messages. Due to their layered structure, most of them also operate on the transport network or physical levels, becoming transparent to the HTTP layer and potentially reducing performance. These application-level operations are called proxy . They may or may not be transparent (modifying requests will not pass through them), and can perform many functions:
- caching (cache can be public or private, like browser cache)
- filtering (like antivirus scanning, parental controls, …)
- load balancing (allow multiple servers to serve different requests)
- authentication (control access to different resources)
- logging (permission to store transaction history)
Basic Aspects of HTTP
HTTP is simple
Even with the greater complexity introduced in HTTP/2 by encapsulating HTTP messages in frames, HTTP is generally simple and human-readable. HTTP messages can be read and understood by humans, providing easier testing for developers and reduced complexity for new users.
HTTP - extensible
The HTTP headers introduced in HTTP/1.0 made the protocol easy to extend and experiment with. New functionality can even be introduced by a simple agreement between client and server on the semantics of the new header.
HTTP is stateless but has a session
HTTP is stateless: there is no relationship between two requests that are executed sequentially over the same connection. This immediately implies the possibility of problems for a user attempting to interact with a particular page sequentially, for example when using a shopping cart in an electronic store. But while HTTP core is stateless, cookies enable stateful sessions. Using header extensibility, cookies are added to the worker thread, allowing the session to share some context, or state, on each HTTP request.
HTTP and connections
The connection is managed at the transport layer, and therefore fundamentally goes beyond the boundaries of HTTP. Although HTTP does not require the underlying transport protocol to be connection-based, requiring only reliability, or no lost messages (i.e., at least an error representation). Among the two most common transport protocols on the Internet, TCP is reliable while UDP is not. HTTP subsequently relies on the TCP standard being connection-based, even though a connection is not always required.
HTTP/1.0 opened a TCP connection for each request/response exchange, with two important disadvantages: opening a connection requires multiple message exchanges and is therefore slow, although it becomes more efficient when sending multiple messages, or when sending messages regularly: warm connections are more effective than cold.
To mitigate these shortcomings, HTTP/1.1 introduced pipelining (which proved difficult to implement) and persistent connections: lying in TCP based the connection can be partially controlled through the Connection header. HTTP/2 took the next step by adding multiplexing of messages across a simple connection, helping to keep the connection warm and more efficient.
Experiments are underway to develop better transport protocol, more suitable for HTTP. For example, Google is experimenting with QUIC, which is based on UDP, to provide a more reliable and efficient transport protocol.
What can be controlled via HTTP
The natural extensibility of HTTP has allowed greater control and functionality of the Web over time. Cache and authentication methods were early features in HTTP history. The ability to relax the original restrictions, on the other hand, was added in the 2010s.
Listed below are general functions, managed with HTTP.
-
The server can instruct proxies and clients what to cache and for how long. The client can instruct intermediate cache proxies to ignore stored documents. - Relaxing Source Constraints
To prevent spyware and other privacy-violating intrusions, the web browser enforces strict separation between websites. Only pages from same source can access information on the web page. Although such restrictions are taxing on the server, HTTP headers can relax the strict separation on the server side, allowing the document to become part of information from different domains (for security reasons). - Authentication
Some pages are only available to special users. Basic authentication can be provided via HTTP, either through the use of the WWW-Authenticate and similar headers, or by setting up a special session using cookies. - Proxy and tunneling
Servers and/or clients are often located on an intranet, and hide their true IP addresses from others. HTTP requests go through a proxy to cross this network barrier. Not all proxies are HTTP proxies. The SOCKS protocol, for example, operates at a lower level. Others, such as ftp, can be handled by these proxies. - Sessions
Usage HTTP cookie allows you to associate a request with a state on the server. This creates a session, even though HTTP is a stateless protocol at its core. This is useful not only for shopping carts in online stores, but also for any sites that allow the user to customize the exit.
HTTP stream
When a client wants to communicate with a server, whether it is a final server or an intermediate proxy, it follows these steps:
- Opening a TCP connection: A TCP connection will be used to send a request or requests and receive a response. The client can open a new connection, reuse an existing one, or open multiple TCP connections to the server.
- Sending an HTTP message: HTTP messages (before HTTP/2) are human-readable. Since HTTP/2, simple messages are encapsulated in frames, making them impossible to read directly, but fundamentally remain the same. GET / HTTP/1.1 Host: site Accept-Language: fr
- Reads response from server: HTTP/1.1 200 OK Date: Sat, 09 Oct 2010 14:28:02 GMT Server: Apache Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT ETag: "51142bc1-7449-479b075b2891b" Accept-Ranges: bytes Content-Length: 29769 Content-Type: text/html
- Closes or reuses the connection for further requests.
If the HTTP pipeline is enabled, multiple requests can be sent without waiting for the first response to be received in its entirety. The HTTP pipeline is difficult to integrate into existing networks, where old pieces of software coexist with modern versions. The HTTP pipeline was replaced in HTTP/2 with more reliable multiplexed requests in a frame.
HTTP messages
HTTP/1.1 and earlier HTTP messages are human-readable. In HTTP/2, these messages are embedded in a new binary structure, a frame, that allows optimizations such as header compression and multiplexing. Even if part of the original HTTP message is sent in this version of HTTP, the semantics of each message are not changed and the client recreates (virtually) the original HTTP request. It is also useful for understanding HTTP/2 messages in HTTP/1.1 format.
HTTP (HyperText Transfer Protocol) was developed as the basis of the World Wide Web.
The HTTP protocol works as follows: the client program establishes a TCP connection with the server (standard port number 80) and issues an HTTP request to it. The server processes this request and issues an HTTP response to the client.
HTTP request structure
An HTTP request consists of a request header and a request body, separated by an empty line. The request body may be missing.
The request header consists of the main (first) line of the request and subsequent lines that clarify the request in the main line. Subsequent lines may also be missing.
The main line query consists of three parts, separated by spaces:
Method(in other words, the HTTP command):
GET- document request. The most commonly used method; in HTTP/0.9, they say, he was the only one.
HEAD- document title request. It differs from GET in that only the request header with information about the document is returned. The document itself is not issued.
POST- this method is used to transfer data to CGI scripts. The data itself appears in subsequent lines of the request in the form of parameters.
PUT- place the document on the server. As far as I know, it is rarely used. A request with this method has a body in which the document itself is transmitted.
Resource- this is the path to a specific file on the server that the client wants to receive (or place - for the PUT method). If the resource is simply some file to be read, the server must return it in the response body for this request. If this is the path to a CGI script, then the server runs the script and returns the result of its execution. By the way, thanks to this unification of resources, the client is practically indifferent to what he represents on the server.
Protocol version-version of the HTTP protocol with which the client program works.
So a simple HTTP request might look like this:
This requests the root file from the web server's root directory.
The lines after the main query line have the following format:
Parameter: value.
This is how the request parameters are set. This is optional; all lines after the main query line may be missing; in this case, the server accepts their value by default or based on the results of the previous request (when working in Keep-Alive mode).
I will list some of the most commonly used HTTP request parameters:
Connection(connection) - can take the values Keep-Alive and close. Keep-Alive means that after issuing this document, the connection to the server is not broken, and more requests can be issued. Most browsers work in Keep-Alive mode, since it allows you to “download” an html page and images for it in one connection to the server. Once set, Keep-Alive mode is maintained until the first error or until the next Connection: close request is explicitly specified.
close ("close") - the connection is closed after responding to this request.
User-Agent- the value is the browser "code", for example:
Mozilla/4.0 (compatible; MSIE 5.0; Windows 95; DigExt)
Accept- a list of content types supported by the browser in order of their preference for a given browser, for example for my IE5:
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/msword, application/vnd.ms-powerpoint, */*
This is obviously necessary for the case when the server can output the same document in different formats.
The value of this parameter is used mainly by CGI scripts to generate a response tailored for a given browser.
Referrer- URL from which you came to this resource.
Host- the name of the host from which the resource is requested. Useful if the server has several virtual servers under the same IP address. In this case, the name of the virtual server is determined by this field.
Accept-Language- supported language. Significant for a server that may serve the same document in different language versions.
HTTP response format
The response format is very similar to the request format: it also has a header and body separated by an empty line.
The header also consists of a main line and parameter lines, but the format of the main line is different from that of the request header.
The main query string consists of 3 fields separated by spaces:
Protocol version- similar to the corresponding request parameter.
Error code- code designation of the “success” of the request. Code 200 means "everything is normal" (OK).
Verbal description of the error- “deciphering” the previous code. For example, for 200 it is OK, for 500 it is Internal Server Error.
The most common http response parameters:
Connection- similar to the corresponding request parameter.
If the server does not support Keep-Alive (there are some), then the Connection value in the response is always close.
Therefore, in my opinion, the correct browser tactic is the following:
1. issue Connection: Keep-Alive in the request;
2. The connection status can be judged by the Connection field in the response.
Content-Type("content type") - contains a designation of the content type of the response.
Depending on the Content-Type value, the browser interprets the response as an HTML page, a gif or jpeg image, a file to be saved to disk, or something else and takes appropriate action. The Content-Type value for the browser is the same as the file extension value for Windows.
Some content types:
text/html - text in HTML format (web page);
text/plain - plain text (similar to Notepad);
image/jpeg - picture in JPEG format;
image/gif - the same, in GIF format;
application/octet-stream - a stream of "octets" (i.e. just bytes) to write to disk.
There are actually many more types of content.
Content-Length("content length") - the length of the response content in bytes.
Last-Modified(“Last modified”) - the date the document was last modified.