gourley-totty-2002

Anthony Graca

Citation #

@book{10.5555/555429,
author = {Totty, Brian and Gourley, David and Sayer, Marjorie and Aggarwal, Anshu and Reddy, Sailu},
title = {Http: The Definitive Guide},
year = {2002},
isbn = {1565925092},
publisher = {O'Reilly \& Associates, Inc.},
address = {USA},
abstract = {Web technology has become the foundation for all sorts of critical networked applications and far-reaching methods of data exchange, and beneath it all is a fundamental protocol: HyperText Transfer Protocol, or HTTP. HTTP: The Definitive Guide documents everything that technical people need for using HTTP efficiently. A reader can understand how web applications work, how the core Internet protocols and architectural building blocks interact, and how to correctly implement Internet clients and servers.}
}

Summary. What are the statements being made? #

Part I. HTTP: The Web’s Foundation #

1. Overview of HTTP #

  • overview:
    • How web clients and servers communicate
    • Where resources and web content come from
    • How web transactions work
    • The format of the messages used for HTTP communication
    • The underlying TCP network transport
    • The different variations of the HTTP protocol
    • Some of the many HTTP architectural components installed aorund the internet

Web Clients and Servers #

  • web content lives on web servers.
  • Web servers speak the HTTP protocol
  • Clients send HTTP requests to servers, and the servers return the requested data in HTTP responses.
  • Example:
    • A web browser is a HTTP client
    • when you browse “http://www.oreilly.com/index.html”, the browser sends an HTTP request to the server www.oreilly.com
    • server tries to find the desired object, which is “/index.html”
    • if successful, the server sends the object to the client in an HTTP response

Resources #

  • Web servers host web resources
    • like a static file. could contain anything
    • text files, HTML files, ms word files, acrobat pdf files, jpeg image files, avi movie files, or anything else
    • can also be dynamic files based on identity or requested information. like youtube or facebook
  • a resource is any kind of content source.

Media Types #

  • HTTP tags each object with a data format label called a MIME type
    • MIME == Multipurpose Internet Mail Extensions
    • was originally used for different email systems but worked well for HTTP to describe multimedia content
  • Web clients look at the MIME type to see how it should handle the requested object
    • Content-type: image/jpeg

URIs #

  • each web server resource has a name. The resource name is called a uniform resource identifier or URI
    • like the postal address of the internet
  • URIs come in two flavors, URLs and URNs

URLs #

  • uniform resource locator is the most common form of resource identifier
  • URLs follow the 3 part format:
    1. first part is the scheme and it describes the protocol used
    2. second part gives the server address
    3. the rest names a resource on the web server
      • /specials/saw-blade.gif
  • Almost every URI is a URL

Transactions #

  • An HTTP transaction consists of a request command, sent from the client, and a response result, sent from the server
    • Communication happens with blocks of data called HTTP messages

Methods #

  • HTTP supports several different request commands, called HTTP methods
    • Every request message has a method that tells the server what action to perform
  • For example:
    • GET: Send named resource from the server to the client
    • PUT: Store data from client to a named server resource
    • DELETE: Delete the named resource from a server
    • POST: Send client data into a server gateway application
    • HEAD: Send just the HTTP headers from the response for the named resource

Status Codes #

  • Every HTTP response contains a status code.
    • 3 digit code
  • Status code classes
    • 100-199: Informational
    • 200-299: Successful
    • 300-399: Redirection
    • 400-499: Client error
    • 500-599: Server error
  • For example:
    • 200: OK. Document returned correctly
    • 302: Redirect. Go someplace else to get the resource
    • 404: Not found. Can’t find this resource

Web Pages can consist of multiple objects #

  • multiple HTTP transactions can be made to populate a web page.
    • like requesting images
  • HTTP requests can be made to different servers

2. URLs and Resources #

  • URLs are the standardized names of the Internet’s resources
  • This chapter is about
    • URL syntax and what the URL components mean and do
    • URL shortcuts
    • URL encoding
    • Common URL schemes

3. HTTP Messages #

  • This chapter is about how HTTP messages contains the data to be moved. Focuses on how to create and understand them
    • The three parts of HTTP messages (start line, headers, and entity body)
    • The differences between request and response messages
    • The various functions (methods) that request messages support
    • The various status codes that are returned with response messages
    • What the various HTTP headers do

Parts of a Message #

  • Each message consists of three parts
    1. start line :: describes the message
    2. headers :: contain attributes that describe the body
    3. body :: optionally contains data
  • Start line and headers are ASCII while body could be in binary

Message Syntax #

  • All HTTP messages are either request messages or response messages
    • Request message request an action from a web server
    • Response message carry results of a request back to a client.

Start Lines #

  • All HTTP messages begin with a start line
    • start line for a request message says what to do
    • start line for a response message says what happened
  • Request line
    • request messages ask servers to do something to a resource
    • describes what operation to perform and a request URL describing the resource to perform the method
    • also contains HTTP version
  • Response line
    • Response message carry status and any resulting data from an operation
    • contains HTTP version, numeric status code, and a textual reason phase

4. Connection Management #

  • This chapter is about:
    • how HTTP uses TCP connections
    • Delays, bottlenecks and clogs in TCP connections
    • HTTP optimizations
    • Dos and don’t for managing connections

TCP Connections #

  • 7 step process:
    1. Browser extracts the hostname from URL
    2. Browser looks up the IP address for this hostname from DNS
    3. Browser gets the port number, usually implied 80 or 443
    4. Browser makes a TCP connection to ip address at a specific port
    5. Browser sents a HTTP GET request message to the server
    6. Browser reads the HTTP response message from server
    7. Browser closes the connection

Part II. HTTP Architecture #

5. Web Servers #

  • p 109

6. Proxies #

  • p 129

7. Caching #

  • p 161

8. Integration Points: Gateways, Tunnels, and Relays #

  • p 197

9. Web Robots #

  • p 215

10. HTTP-NG #

  • p 247

Part III. Identification, Authorization, and Security #

11. Client Identification and Cookies #

  • p 257

12. Basic Authentication #

  • p 277

13. Digest Authentication #

  • p 286

14. Secure HTTP #

  • p 306

Part IV. Entities, Encodings, and Internationalization #

15. Entities and Encodings #

  • p 341

16. Internationalization #

  • p 370

17. Content Negotiation and Transcoding #

  • p 395

Part V. Content Publishing and Distribution #

18. Web Hosting #

  • p 411

19. Publishing Systems #

  • p 424

20. Redirection and Load Balancing #

  • p 448

21. Logging and Usage Tracking #

  • p 483

Read after #

Web use cases #

  • Sharing documents
  • making resources easily accessible and findable
  • embedded servers to control devices