File Transfer Protocol.
Internet protocol suite
Application layer
* DHCP
* DHCPv6
* DNS
* FTP
* HTTP
* IMAP
* IRC
* LDAP
* MGCP
* NNTP
* BGP
* NTP
* POP
* RPC
* RTP
* RTSP
* RIP
* SIP
* SMTP
* SNMP
* SOCKS
* SSH
* Telnet
* TLS/SSL
* XMPP
* (more)
Transport layer
* TCP
* UDP
* DCCP
* SCTP
* RSVP
* (more)
Internet layer
* IP
o IPv4
o IPv6
* ICMP
* ICMPv6
* ECN
* IGMP
* IPsec
* (more)
Link layer
* ARP/InARP
* NDP
* OSPF
* Tunnels
o L2TP
* PPP
* Media access control
o Ethernet
o DSL
o ISDN
o FDDI
* (more)
* v
* t
* e
File
Transfer Protocol (FTP) is a standard network protocol used to transfer
files from one host to another host over a TCP-based network, such as
the Internet.
FTP is built on a client-server architecture
and uses separate control and data connections between the client and
the server.[1] FTP users may authenticate themselves using a clear-text
sign-in protocol, normally in the form of a username and password, but
can connect anonymously if the server is configured to allow it. For
secure transmission that hides (encrypts) the username and password, and
encrypts the content, FTP is often secured with SSL/TLS ("FTPS"). SSH
File Transfer Protocol ("SFTP") is sometimes also used instead, but is
technologically different.
The first FTP client applications
were command-line applications developed before operating systems had
graphical user interfaces, and are still shipped with most Windows,
Unix, and Linux operating systems.[2][3] Dozens of FTP clients and
automation utilities have since been developed for desktops, servers,
mobile devices, and hardware, and FTP has been incorporated into
hundreds of productivity applications, such as Web page editors.
History
The
original specification for the File Transfer Protocol was written by
Abhay Bhushan and published as RFC 114 on 16 April 1971 and later
replaced by RFC 765 (June 1980) and RFC 959 (October 1985), the current
specification. Several proposed standards amend RFC 959, for example RFC
2228 (June 1997) proposes security extensions and RFC 2428 (September
1998) adds support for IPv6 and defines a new type of passive mode.[4]
Protocol overview
Communication and data transfer
The protocol was first specified June 1980 and updated in RFC 959,[2] which is summarized here.[5]
Illustration of starting a passive connection using port 21
FTP
may run in active or passive mode, which determines how the data
connection is established.[6] In active mode, the client creates a TCP
control connection to the server and sends the server the client's IP
address and an arbitrary client port number, and then waits until the
server initiates the data connection over TCP to that client IP address
and client port number.[7] In situations where the client is behind a
firewall and unable to accept incoming TCP connections, passive mode may
be used. In this mode, the client uses the control connection to send a
PASV command to the server and then receives a server IP address and
server port number from the server,[7][6] which the client then uses to
open a data connection from an arbitrary client port to the server IP
address and server port number received.[5] Both modes were updated in
September 1998 to support IPv6. Further changes were introduced to the
passive mode at that time, updating it to extended passive mode.[8]
The
server responds over the control connection with three-digit status
codes in ASCII with an optional text message. For example "200" (or "200
OK") means that the last command was successful. The numbers represent
the code for the response and the optional text represents a
human-readable explanation or request (e.g. <Need account for storing
file>).[1] An ongoing transfer of file data over the data connection
can be aborted using an interrupt message sent over the control
connection.
While transferring data over the network, four data representations can be used:[2][3][4]
* ASCII mode: used for text. Data is converted, if needed, from the
sending host's character representation to "8-bit ASCII" before
transmission, and (again, if necessary) to the receiving host's
character representation. As a consequence, this mode is inappropriate
for files that contain data other than plain text.
* Image
mode (commonly called Binary mode): the sending machine sends each file
byte for byte, and the recipient stores the bytestream as it receives
it. (Image mode support has been recommended for all implementations of
FTP).
* EBCDIC mode: use for plain text between hosts using the EBCDIC character set. This mode is otherwise like ASCII mode.
* Local mode: Allows two computers with identical setups to send data
in a proprietary format without the need to convert it to ASCII
For
text files, different format control and record structure options are
provided. These features were designed to facilitate files containing
Telnet or ASA
Data transfer can be done in any of three modes:[1][2]
* Stream mode: Data is sent as a continuous stream, relieving FTP from
doing any processing. Rather, all processing is left up to TCP. No
End-of-file indicator is needed, unless the data is divided into
records.
* Block mode: FTP breaks the data into several
blocks (block header, byte count, and data field) and then passes it on
to TCP.[4]
* Compressed mode: Data is compressed using a single algorithm (usually run-length encoding).
Login
FTP
login utilizes a normal username and password scheme for granting
access.[2] The username is sent to the server using the USER command,
and the password is sent using the PASS command.[2] If the information
provided by the client is accepted by the server, the server will send a
greeting to the client and the session will commence.[2] If the server
supports it, users may log in without providing login credentials, but
the same server may authorize only limited access for such sessions.[2]
Anonymous FTP
A
host that provides an FTP service may provide anonymous FTP access.[2]
Users typically log into the service with an 'anonymous' (lower-case and
case-sensitive in some FTP servers) account when prompted for user
name. Although users are commonly asked to send their email address
instead of a password,[3] no verification is actually performed on the
supplied data.[9] Many FTP hosts whose purpose is to provide software
updates will allow anonymous logins.[3]
NAT and firewall traversal
FTP
normally transfers data by having the server connect back to the
client, after the PORT command is sent by the client. This is
problematic for both NATs and firewalls, which do not allow connections
from the Internet towards internal hosts.[10] For NATs, an additional
complication is that the representation of the IP addresses and port
number in the PORT command refer to the internal host's IP address and
port, rather than the public IP address and port of the NAT.
There
are two approaches to this problem. One is that the FTP client and FTP
server use the PASV command, which causes the data connection to be
established from the FTP client to the server.[10] This is widely used
by modern FTP clients. Another approach is for the NAT to alter the
values of the PORT command, using an application-level gateway for this
purpose.[10]
Differences from HTTP
FTP is considered out-of-band control, as opposed to in-band control which is used by HTTP.[11]
Web browser support
Most
common web browsers can retrieve files hosted on FTP servers, although
they may not support protocol extensions such as FTPS.[3][12] When an
FTP—rather than an HTTP—URL is supplied, the accessible contents on the
remote server are presented in a manner that is similar to that used for
other Web content. A full-featured FTP client can be run within Firefox
in the form of an extension called FireFTP
Syntax
FTP
URL syntax is described in RFC1738,[13] taking the form:
ftp://[<user>[:<password>]@]<host>[:<port>]/<url-path>[13]
(The bracketed parts are optional.)
More details on
specifying a username and password may be found in the browsers'
documentation, such as, for example, Firefox [14] and Internet
Explorer.[15] By default, most web browsers use passive (PASV) mode,
which more easily traverses end-user firewalls.
Security
FTP
was not designed to be a secure protocol—especially by today's
standards—and has many security weaknesses.[16] In May 1999, the authors
of RFC 2577 listed a vulnerability to the following problems:[17]
* Brute force attacks
* Bounce attacks
* Packet capture (sniffing)
* Port stealing
* Spoof attacks
* Username protection
FTP
is not able to encrypt its traffic; all transmissions are in clear
text, and usernames, passwords, commands and data can be easily read by
anyone able to perform packet capture (sniffing) on the network.[2][16]
This problem is common to many of the Internet Protocol specifications
(such as SMTP, Telnet, POP and IMAP) that were designed prior to the
creation of encryption mechanisms such as TLS or SSL.[4] A common
solution to this problem is to use the "secure", TLS-protected versions
of the insecure protocols (e.g. FTPS for FTP, TelnetS for Telnet, etc.)
or a different, more secure protocol that can handle the job, such as
the SFTP/SCP tools included with most implementations of the Secure
Shell protocol.
Secure FTP
There are several methods of securely transferring files that have been called "Secure FTP" at one point or another.
FTPS
Explicit
FTPS is an extension to the FTP standard that allows clients to request
that the FTP session be encrypted. This is done by sending the "AUTH
TLS" command. The server has the option of allowing or denying
connections that do not request TLS. This protocol extension is defined
in the proposed standard: RFC 4217. Implicit FTPS is a deprecated
standard for FTP that required the use of a SSL or TLS connection. It
was specified to use different ports than plain FTP.
SFTP
SFTP,
the "SSH File Transfer Protocol," is not related to FTP except that it
also transfers files and has a similar command set for users. SFTP, or
secure FTP, is a program that uses Secure Shell (SSH) to transfer files.
Unlike standard FTP, it encrypts both commands and data, preventing
passwords and sensitive information from being transmitted openly over
the network. It is functionally similar to FTP, but because it uses a
different protocol, standard FTP clients cannot be used to talk to an
SFTP server, nor can one connect to an FTP server with a client that
supports only SFTP.
FTP over SSH (not SFTP)
FTP over
SSH (not SFTP) refers to the practice of tunneling a normal FTP session
over an SSH connection.[16] Because FTP uses multiple TCP connections
(unusual for a TCP/IP protocol that is still in use), it is particularly
difficult to tunnel over SSH. With many SSH clients, attempting to set
up a tunnel for the control channel (the initial client-to-server
connection on port 21) will protect only that channel; when data is
transferred, the FTP software at either end will set up new TCP
connections (data channels), which bypass the SSH connection and thus
have no confidentiality or integrity protection, etc.
Otherwise,
it is necessary for the SSH client software to have specific knowledge
of the FTP protocol, to monitor and rewrite FTP control channel messages
and autonomously open new packet forwardings for FTP data channels.
Software packages that support this mode include:
* Tectia ConnectSecure (Win/Linux/Unix) of SSH Communications Security's software suite
* Tectia Server for IBM z/OS of SSH Communications Security's software suite
* FONC (the GPL licensed)
* Co:Z FTPSSH Proxy
FTP
over SSH is sometimes referred to as secure FTP; this should not be
confused with other methods of securing FTP, such as SSL/TLS (FTPS).
Other methods of transferring files using SSH that are not related to
FTP include SFTP and SCP; in each of these, the entire conversation
(credentials and data) is always protected by the SSH protocol.
Below
is a summary of the reply codes that may be returned by an FTP server.
These codes have been standardized in RFC 959 by the IETF. As stated
earlier in this article, the reply code is a three-digit value. The
first digit is used to indicate one of three possible outcomes—success,
failure or to indicate an error or incomplete reply:
* 2yz – Success reply
* 4yz or 5yz – Failure Reply
* 1yz or 3yz – Error or Incomplete reply
The second digit defines the kind of error:
* x0z – Syntax. These replies refer to syntax errors.
* x1z – Information. Replies to requests for information.
* x2z – Connections. Replies referring to the control and data connections.
* x3z – Authentication and accounting. Replies for the login process and accounting procedures.
* x4z – Not defined.
* x5z – File system. These replies relay status codes from the server file system.
The third digit of the reply code is used to provide additional detail for each of the categories defined by the second digit.
Internet
Visualization of Internet routing paths
An Opte Project visualization of routing paths through a portion of the Internet.
General[show]
* Access
* Censorship
* Democracy
* Digital divide
* Digital rights
* Freedom of information
* History of the Internet
* Internet phenomena
* Net neutrality
* Pioneers
* Privacy
* Sociology
* Usage
Governance[show]
* ICANN
*
Internet Engineering Task Force
* Internet Governance Forum
* Internet Society
Information infrastructure[show]
* Domain Name System
* Hypertext Transfer Protocol
* Internet exchange point
* Internet Protocol
* Internet protocol suite
* Internet service provider
* IP address
* POP3 email protocol
* Simple Mail Transfer Protocol
Services[show]
* Blogs
o Microblogging
* Email
* Fax
* File sharing
* File transfer
* Games
* Instant messaging
* Podcasts
* Shopping
* Television
* Voice over IP
* World Wide Web
o search
Guides[show]
* Book
* Index
* Outline
Portal icon Internet portal
* v
* t
* e
World Wide Web Center
The web's logo designed by Robert Cailliau
Inventor Tim Berners-Lee[1]
Robert Cailliau
Company CERN
Availability Worldwide
The
World Wide Web (abbreviated as WWW or W3,[2] commonly known as the
web), is a system of interlinked hypertext documents accessed via the
Internet. With a web browser, one can view web pages that may contain
text, images, videos, and other multimedia, and navigate between them
via hyperlinks.
Using concepts from his earlier hypertext systems
like ENQUIRE, British engineer, computer scientist and at that time
employee of the CERN, Sir Tim Berners-Lee, now Director of the World
Wide Web Consortium (W3C), wrote a proposal in March 1989 for what would
eventually become the World Wide Web.[1] At CERN, a European research
organisation near Geneva straddling the border between France and
Switzerland,[3] Berners-Lee and Belgian computer scientist Robert
Cailliau proposed in 1990 to use hypertext "to link and access
information of various kinds as a web of nodes in which the user can
browse at will",[4] and they publicly introduced the project in December
of the same year.[5]
Contents
Main article: History of the World Wide Web
The NeXT Computer used by Berners-Lee. The handwritten label declares, "This machine is a server. DO NOT POWER IT DOWN!!"
In
the May 1970 issue of Popular Science magazine, Arthur C. Clarke
predicted that satellites would someday "bring the accumulated knowledge
of the world to your fingertips" using a console that would combine the
functionality of the photocopier, telephone, television and a small
computer, allowing data transfer and video conferencing around the
globe.[6]
In March 1989, Tim Berners-Lee wrote a proposal that
referenced ENQUIRE, a database and software project he had built in
1980, and described a more elaborate information management system.[7]
With
help from Robert Cailliau, he published a more formal proposal (on 12
November 1990) to build a "Hypertext project" called "WorldWideWeb" (one
word, also "W3") as a "web" of "hypertext documents" to be viewed by
"browsers" using a client–server architecture.[4] This proposal
estimated that a read-only web would be developed within three months
and that it would take six months to achieve "the creation of new links
and new material by readers, [so that] authorship becomes universal" as
well as "the automatic notification of a reader when new material of
interest to him/her has become available." While the read-only goal was
met, accessible authorship of web content took longer to mature, with
the wiki concept, blogs, Web 2.0 and RSS/Atom.[8]
The proposal
was modeled after the SGML reader Dynatext by Electronic Book
Technology, a spin-off from the Institute for Research in Information
and Scholarship at Brown University. The Dynatext system, licensed by
CERN, was a key player in the extension of SGML ISO 8879:1986 to
Hypermedia within HyTime, but it was considered too expensive and had an
inappropriate licensing policy for use in the general high energy
physics community, namely a fee for each document and each document
alteration.
The CERN datacenter in 2010 housing some WWW servers
A
NeXT Computer was used by Berners-Lee as the world's first web server
and also to write the first web browser, WorldWideWeb, in 1990. By
Christmas 1990, Berners-Lee had built all the tools necessary for a
working Web:[9] the first web browser (which was a web editor as well);
the first web server; and the first web pages,[10] which described the
project itself. On 6 August 1991, he posted a short summary of the World
Wide Web project on the alt.hypertext newsgroup.[11] This date also
marked the debut of the Web as a publicly available service on the
Internet. Many newsmedia have reported that the first photo on the web
was uploaded by Berners-Lee in 1992, an image of the CERN house band Les
Horribles Cernettes taken by Silvano de Gennaro; Gennaro has disclaimed
this story, writing that media were "totally distorting our words for
the sake of cheap sensationalism."[12]
The first server outside
Europe was set up at the Stanford Linear Accelerator Center (SLAC) in
Palo Alto, California, to host the SPIRES-HEP database. Accounts differ
substantially as to the date of this event. The World Wide Web
Consortium says December 1992,[13] whereas SLAC itself claims
1991.[14][15] This is supported by a W3C document titled A Little
History of the World Wide Web.[16]
The crucial underlying concept
of hypertext originated with older projects from the 1960s, such as the
Hypertext Editing System (HES) at Brown University, Ted Nelson's
Project Xanadu, and Douglas Engelbart's oN-Line System (NLS). Both
Nelson and Engelbart were in turn inspired by Vannevar Bush's
microfilm-based "memex", which was described in the 1945 essay "As We
May Think".[17]
Berners-Lee's breakthrough was to marry hypertext
to the Internet. In his book Weaving The Web, he explains that he had
repeatedly suggested that a marriage between the two technologies was
possible to members of both technical communities, but when no one took
up his invitation, he finally assumed the project himself. In the
process, he developed three essential technologies:
1. a
system of globally unique identifiers for resources on the Web and
elsewhere, the universal document identifier (UDI), later known as
uniform resource locator (URL) and uniform resource identifier (URI);
2. the publishing language HyperText Markup Language (HTML);
3. the Hypertext Transfer Protocol (HTTP).[18]
The
World Wide Web had a number of differences from other hypertext systems
available at the time. The web required only unidirectional links
rather than bidirectional ones, making it possible for someone to link
to another resource without action by the owner of that resource. It
also significantly reduced the difficulty of implementing web servers
and browsers (in comparison to earlier systems), but in turn presented
the chronic problem of link rot. Unlike predecessors such as HyperCard,
the World Wide Web was non-proprietary, making it possible to develop
servers and clients independently and to add extensions without
licensing restrictions. On 30 April 1993, CERN announced that the World
Wide Web would be free to anyone, with no fees due.[19] Coming two
months after the announcement that the server implementation of the
Gopher protocol was no longer free to use, this produced a rapid shift
away from Gopher and towards the Web. An early popular web browser was
ViolaWWW for Unix and the X Windowing System.
Robert Cailliau, Jean-François Abramatic of IBM, and Tim Berners-Lee at the 10th anniversary of the World Wide Web Consortium.
Scholars
generally agree that a turning point for the World Wide Web began with
the introduction[20] of the Mosaic web browser[21] in 1993, a graphical
browser developed by a team at the National Center for Supercomputing
Applications at the University of Illinois at Urbana-Champaign
(NCSA-UIUC), led by Marc Andreessen. Funding for Mosaic came from the
U.S. High-Performance Computing and Communications Initiative and the
High Performance Computing and Communication Act of 1991, one of several
computing developments initiated by U.S. Senator Al Gore.[22] Prior to
the release of Mosaic, graphics were not commonly mixed with text in web
pages and the web's popularity was less than older protocols in use
over the Internet, such as Gopher and Wide Area Information Servers
(WAIS). Mosaic's graphical user interface allowed the Web to become, by
far, the most popular Internet protocol.
The World Wide Web
Consortium (W3C) was founded by Tim Berners-Lee after he left the
European Organization for Nuclear Research (CERN) in October 1994. It
was founded at the Massachusetts Institute of Technology Laboratory for
Computer Science (MIT/LCS) with support from the Defense Advanced
Research Projects Agency (DARPA), which had pioneered the Internet; a
year later, a second site was founded at INRIA (a French national
computer research lab) with support from the European Commission DG
InfSo; and in 1996, a third continental site was created in Japan at
Keio University. By the end of 1994, while the total number of websites
was still minute compared to present standards, quite a number of
notable websites were already active, many of which are the precursors
or inspiration for today's most popular services.
Connected by
the existing Internet, other websites were created around the world,
adding international standards for domain names and HTML. Since then,
Berners-Lee has played an active role in guiding the development of web
standards (such as the markup languages in which web pages are
composed), and has advocated his vision of a Semantic Web. The World
Wide Web enabled the spread of information over the Internet through an
easy-to-use and flexible format. It thus played an important role in
popularizing use of the Internet.[23] Although the two terms are
sometimes conflated in popular use, World Wide Web is not synonymous
with Internet.[24] The web is a collection of documents and both client
and server software using Internet protocols such as TCP/IP and HTTP.
Tim Berners-Lee was knighted in 2004 by Queen Elizabeth II for his
contribution to the World Wide Web.
[edit] Function
The terms
Internet and World Wide Web are often used in everyday speech without
much distinction. However, the Internet and the World Wide Web are not
the same. The Internet is a global system of interconnected computer
networks. In contrast, the web is one of the services that runs on the
Internet. It is a collection of text documents and other resources,
linked by hyperlinks and URLs, usually accessed by web browsers from web
servers. In short, the web can be thought of as an application
"running" on the Internet.[25]
Viewing a web page on the World
Wide Web normally begins either by typing the URL of the page into a web
browser or by following a hyperlink to that page or resource. The web
browser then initiates a series of communication messages, behind the
scenes, in order to fetch and display it. In the 1990s, using a browser
to view web pages—and to move from one web page to another through
hyperlinks—came to be known as 'browsing,' 'web surfing,' or 'navigating
the web'. Early studies of this new behavior investigated user patterns
in using web browsers. One study, for example, found five user
patterns: exploratory surfing, window surfing, evolved surfing, bounded
navigation and targeted navigation.[26]
The following example
demonstrates how a web browser works. Consider accessing a page with the
URL http://example.org/wiki/World_Wide_Web.
First, the browser
resolves the server-name portion of the URL (example.org) into an
Internet Protocol address using the globally distributed database known
as the Domain Name System (DNS); this lookup returns an IP address such
as 208.80.152.2. The browser then requests the resource by sending an
HTTP request across the Internet to the computer at that particular
address. It makes the request to a particular application port in the
underlying Internet Protocol Suite so that the computer receiving the
request can distinguish an HTTP request from other network protocols it
may be servicing such as e-mail delivery; the HTTP protocol normally
uses port 80. The content of the HTTP request can be as simple as the
two lines of text
GET /wiki/World_Wide_Web HTTP/1.1
Host: example.org
The
computer receiving the HTTP request delivers it to web server software
listening for requests on port 80. If the web server can fulfill the
request it sends an HTTP response back to the browser indicating
success, which can be as simple as
HTTP/1.0 200 OK
Content-Type: text/html; charset=UTF-8
followed by the content of the requested page. The Hypertext Markup Language for a basic web page looks like
<html>
<head>
<title>Example.org – The World Wide Web</title>
</head>
<body>
<p>The World Wide Web, abbreviated as WWW and commonly known ...</p>
</body>
</html>
The
web browser parses the HTML, interpreting the markup (<title>,
<p> for paragraph, and such) that surrounds the words in order to
draw the text on the screen.
Many web pages use HTML to reference
the URLs of other resources such as images, other embedded media,
scripts that affect page behavior, and Cascading Style Sheets that
affect page layout. The browser will make additional HTTP requests to
the web server for these other Internet media types. As it receives
their content from the web server, the browser progressively renders the
page onto the screen as specified by its HTML and these additional
resources.
Most web pages contain hyperlinks to other related
pages and perhaps to downloadable files, source documents, definitions
and other web resources. In the underlying HTML, a hyperlink looks like
<a href="http://example.org/wiki/Main_Page">Example.org, a free encyclopedia</a>
Graphic representation of a minute fraction of the WWW, demonstrating hyperlinks
Such
a collection of useful, related resources, interconnected via hypertext
links is dubbed a web of information. Publication on the Internet
created what Tim Berners-Lee first called the WorldWideWeb (in its
original CamelCase, which was subsequently discarded) in November
1990.[4]
The hyperlink structure of the WWW is described by the
webgraph: the nodes of the webgraph correspond to the web pages (or
URLs) the directed edges between them to the hyperlinks.
Over
time, many web resources pointed to by hyperlinks disappear, relocate,
or are replaced with different content. This makes hyperlinks obsolete, a
phenomenon referred to in some circles as link rot and the hyperlinks
affected by it are often called dead links. The ephemeral nature of the
Web has prompted many efforts to archive web sites. The Internet
Archive, active since 1996, is the best known of such efforts.Dynamic
updates of web pages
Main article: Ajax (programming)
JavaScript
is a scripting language that was initially developed in 1995 by Brendan
Eich, then of Netscape, for use within web pages.[27] The standardised
version is ECMAScript.[27] To make web pages more interactive, some web
applications also use JavaScript techniques such as Ajax (asynchronous
JavaScript and XML). Client-side script is delivered with the page that
can make additional HTTP requests to the server, either in response to
user actions such as mouse movements or clicks, or based on lapsed time.
The server's responses are used to modify the current page rather than
creating a new page with each response, so the server needs only to
provide limited, incremental information. Multiple Ajax requests can be
handled at the same time, and users can interact with the page while
data is being retrieved. Web pages may also regularly poll the server to
check whether new information is available.[28] WWW prefix
Many
domain names used for the World Wide Web begin with www because of the
long-standing practice of naming Internet hosts (servers) according to
the services they provide. The hostname for a web server is often www,
in the same way that it may be ftp for an FTP server, and news or nntp
for a USENET news server. These host names appear as Domain Name System
or [domain name server](DNS) subdomain names, as in www.example.com. The
use of 'www' as a subdomain name is not required by any technical or
policy standard and many web sites do not use it; indeed, the first ever
web server was called nxoc01.cern.ch.[29] According to Paolo
Palazzi,[30] who worked at CERN along with Tim Berners-Lee, the popular
use of 'www' subdomain was accidental; the World Wide Web project page
was intended to be published at www.cern.ch while info.cern.ch was
intended to be the CERN home page, however the dns records were never
switched, and the practice of prepending 'www' to an institution's
website domain name was subsequently copied. Many established websites
still use 'www', or they invent other subdomain names such as 'www2',
'secure', etc. Many such web servers are set up so that both the domain
root (e.g., example.com) and the www subdomain (e.g., www.example.com)
refer to the same site; others require one form or the other, or they
may map to different web sites.
The use of a subdomain name is
useful for load balancing incoming web traffic by creating a CNAME
record that points to a cluster of web servers. Since, currently, only a
subdomain can be used in a CNAME, the same result cannot be achieved by
using the bare domain root.
When a user submits an incomplete
domain name to a web browser in its address bar input field, some web
browsers automatically try adding the prefix "www" to the beginning of
it and possibly ".com", ".org" and ".net" at the end, depending on what
might be missing. For example, entering 'microsoft' may be transformed
to http://www.microsoft.com/ and 'openoffice' to
http://www.openoffice.org. This feature started appearing in early
versions of Mozilla Firefox, when it still had the working title
'Firebird' in early 2003, from an earlier practice in browsers such as
Lynx.[31] It is reported that Microsoft was granted a US patent for the
same idea in 2008, but only for mobile devices.[32]
In English,
www is usually read as double-u double-u double-u. Some users pronounce
it dub-dub-dub, particularly in New Zealand. Stephen Fry, in his
"Podgrammes" series of podcasts, pronouncing it wuh wuh wuh. The English
writer Douglas Adams once quipped in The Independent on Sunday (1999):
"The World Wide Web is the only thing I know of whose shortened form
takes three times longer to say than what it's short for". In Mandarin
Chinese, World Wide Web is commonly translated via a phono-semantic
matching to wà n wéi wang (???), which satisfies www and literally means
"myriad dimensional net",[33] a translation that very appropriately
reflects the design concept and proliferation of the World Wide Web. Tim
Berners-Lee's web-space states that World Wide Web is officially
spelled as three separate words, each capitalised, with no intervening
hyphens.[34]
Use of the www prefix is declining as Web 2.0 web
applications seek to brand their domain names and make them easily
pronounceable.[35] As the mobile web grows in popularity, services like
Gmail.com, MySpace.com, Facebook.com, Bebo.com and Twitter.com are most
often discussed without adding www to the domain (or, indeed, the
.com).Scheme specifiers: http and https
The scheme specifier
http:// or https:// at the start of a web URI refers to Hypertext
Transfer Protocol or HTTP Secure respectively. Unlike www, which has no
specific purpose, these specify the communication protocol to be used
for the request and response. The HTTP protocol is fundamental to the
operation of the World Wide Web and the added encryption layer in HTTPS
is essential when confidential information such as passwords or banking
information are to be exchanged over the public Internet. Web browsers
usually prepend http:// to addresses too, if omitted.Web servers
Main article: Web server
The
primary function of a web server is to deliver web pages on the request
to clients. This means delivery of HTML documents and any additional
content that may be included by a document, such as images, style sheets
and scripts.Privacy
Main article: Internet privacy
Every time
a web page is requested from a web server the server can identify, and
usually it logs, the IP address from which the request arrived. Equally,
unless set not to do so, most web browsers record the web pages that
have been requested and viewed in a history feature, and usually cache
much of the content locally. Unless HTTPS encryption is used, web
requests and responses travel in plain text across the internet and they
can be viewed, recorded and cached by intermediate systems.
When
a web page asks for, and the user supplies, personally identifiable
information such as their real name, address, e-mail address, etc., then
a connection can be made between the current web traffic and that
individual. If the website uses HTTP cookies, username and password
authentication, or other tracking techniques, then it will be able to
relate other web visits, before and after, to the identifiable
information provided. In this way it is possible for a web-based
organisation to develop and build a profile of the individual people who
use its site or sites. It may be able to build a record for an
individual that includes information about their leisure activities,
their shopping interests, their profession, and other aspects of their
demographic profile. These profiles are obviously of potential interest
to marketeers, advertisers and others. Depending on the website's terms
and conditions and the local laws that apply information from these
profiles may be sold, shared, or passed to other organisations without
the user being informed. For many ordinary people, this means little
more than some unexpected e-mails in their in-box, or some uncannily
relevant advertising on a future web page. For others, it can mean that
time spent indulging an unusual interest can result in a deluge of
further targeted marketing that may be unwelcome. Law enforcement,
counter terrorism and espionage agencies can also identify, target and
track individuals based on what appear to be their interests or
proclivities on the web.
Social networking sites make a point of
trying to get the user to truthfully expose their real names, interests
and locations. This makes the social networking experience more
realistic and therefore engaging for all their users. On the other hand,
photographs uploaded and unguarded statements made will be identified
to the individual, who may regret some decisions to publish these data.
Employers, schools, parents and other relatives may be influenced by
aspects of social networking profiles that the posting individual did
not intend for these audiences. On-line bullies may make use of personal
information to harass or stalk users. Modern social networking websites
allow fine grained control of the privacy settings for each individual
posting, but these can be complex and not easy to find or use,
especially for beginners.[36]
Photographs and videos posted onto
websites have caused particular problems, as they can add a person's
face to an on-line profile. With modern and potential facial recognition
technology, it may then be possible to relate that face with other,
previously anonymous, images, events and scenarios that have been imaged
elsewhere. Because of image caching, mirroring and copying, it is
difficult to remove an image from the World Wide Web.
[edit] Intellectual property
Question book-new.svg
This section does not cite any references or sources. Please help
improve this section by adding citations to reliable sources. Unsourced
material may be challenged and removed. (January 2013)
Main article: Intellectual property
The
intellectual property rights for any creative work initially rests with
its creator. Web users who want to publish their work onto the World
Wide Web, however, need to be aware of the details of the way they do
it. If artwork, photographs, writings, poems, or technical innovations
are published by their creator onto a privately owned web server, then
they may choose the copyright and other conditions freely themselves.
This is unusual though; more commonly work is uploaded to web sites and
servers that are owned by other organizations. It depends upon the terms
and conditions of the site or service provider to what extent the
original owner automatically signs over rights to their work by the
choice of destination and by the act of uploading.[citation needed]
Many
users of the web erroneously assume that everything they may find on
line is freely available to them as if it was in the public domain. This
is almost never the case, unless the web site publishing the work
clearly states that it is. On the other hand, content owners are aware
of this widespread belief, and expect that sooner or later almost
everything that is published will probably be used in some capacity
somewhere without their permission. Many publishers therefore embed
visible or invisible digital watermarks in their media files, sometimes
charging users to receive unmarked copies for legitimate use. Digital
rights management includes forms of access control technology that
further limit the use of digital content even after it has been bought
or downloaded.[citation needed]
[edit] Security
The web has
become criminals' preferred pathway for spreading malware. Cybercrime
carried out on the web can include identity theft, fraud, espionage and
intelligence gathering.[37] Web-based vulnerabilities now outnumber
traditional computer security concerns,[38][39] and as measured by
Google, about one in ten web pages may contain malicious code.[40] Most
web-based attacks take place on legitimate websites, and most, as
measured by Sophos, are hosted in the United States, China and
Russia.[41] The most common of all malware threats is SQL injection
attacks against websites.[42] Through HTML and URIs the web was
vulnerable to attacks like cross-site scripting (XSS) that came with the
introduction of JavaScript[43] and were exacerbated to some degree by
Web 2.0 and Ajax web design that favors the use of scripts.[44] Today by
one estimate, 70% of all websites are open to XSS attacks on their
users.[45]
Proposed solutions vary to extremes. Large security
vendors like McAfee already design governance and compliance suites to
meet post-9/11 regulations,[46] and some, like Finjan have recommended
active real-time inspection of code and all content regardless of its
source.[37] Some have argued that for enterprise to see security as a
business opportunity rather than a cost center,[47] "ubiquitous,
always-on digital rights management" enforced in the infrastructure by a
handful of organizations must replace the hundreds of companies that
today secure data and networks.[48] Jonathan Zittrain has said users
sharing responsibility for computing safety is far preferable to locking
down the Internet.[49]
[edit] Standards
Main article: Web standards
Many
formal standards and other technical specifications and software define
the operation of different aspects of the World Wide Web, the Internet,
and computer information exchange. Many of the documents are the work
of the World Wide Web Consortium (W3C), headed by Berners-Lee, but some
are produced by the Internet Engineering Task Force (IETF) and other
organizations.
Usually, when web standards are discussed, the following publications are seen as foundational:
* Recommendations for markup languages, especially HTML and XHTML, from
the W3C. These define the structure and interpretation of hypertext
documents.
* Recommendations for stylesheets, especially CSS, from the W3C.
* Standards for ECMAScript (usually in the form of JavaScript), from Ecma International.
* Recommendations for the Document Object Model, from W3C.
Additional
publications provide definitions of other essential technologies for
the World Wide Web, including, but not limited to, the following:
* Uniform Resource Identifier (URI), which is a universal system for
referencing resources on the Internet, such as hypertext documents and
images. URIs, often called URLs, are defined by the IETF's RFC 3986 /
STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as
its predecessors and numerous URI scheme-defining RFCs;
*
HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616:
HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the
browser and server authenticate each other.
[edit] Accessibility
Main article: Web accessibility
There
are methods available for accessing the web in alternative mediums and
formats, so as to enable use by individuals with disabilities. These
disabilities may be visual, auditory, physical, speech related,
cognitive, neurological, or some combination therin. Accessibility
features also help others with temporary disabilities like a broken arm
or the aging population as their abilities change.[50] The Web is used
for receiving information as well as providing information and
interacting with society. The World Wide Web Consortium claims it
essential that the Web be accessible in order to provide equal access
and equal opportunity to people with disabilities.[51] Tim Berners-Lee
once noted, "The power of the Web is in its universality. Access by
everyone regardless of disability is an essential aspect."[50] Many
countries regulate web accessibility as a requirement for websites.[52]
International cooperation in the W3C Web Accessibility Initiative led to
simple guidelines that web content authors as well as software
developers can use to make the Web accessible to persons who may or may
not be using assistive technology.[50][53]
[edit] Internationalization
The
W3C Internationalization Activity assures that web technology will work
in all languages, scripts, and cultures.[54] Beginning in 2004 or 2005,
Unicode gained ground and eventually in December 2007 surpassed both
ASCII and Western European as the Web's most frequently used character
encoding.[55] Originally RFC 3986 allowed resources to be identified by
URI in a subset of US-ASCII. RFC 3987 allows more characters—any
character in the Universal Character Set—and now a resource can be
identified by IRI in any language.[56] Statistics
Between 2005
and 2010, the number of web users doubled, and was expected to surpass
two billion in 2010.[57] Early studies in 1998 and 1999 estimating the
size of the web using capture/recapture methods showed that much of the
web was not indexed by search engines and the web was much larger than
expected.[58][59] According to a 2001 study, there were a massive
number, over 550 billion, of documents on the Web, mostly in the
invisible Web, or Deep Web.[60] A 2002 survey of 2,024 million web
pages[61] determined that by far the most web content was in the English
language: 56.4%; next were pages in German (7.7%), French (5.6%), and
Japanese (4.9%). A more recent study, which used web searches in 75
different languages to sample the web, determined that there were over
11.5 billion web pages in the publicly indexable web as of the end of
January 2005.[62] As of March 2009[update], the indexable web contains
at least 25.21 billion pages.[63] On 25 July 2008, Google software
engineers Jesse Alpert and Nissan Hajaj announced that Google Search had
discovered one trillion unique URLs.[64] As of May 2009[update], over
109.5 million domains operated.[65][not in citation given] Of these 74%
were commercial or other sites operating in the .com generic top-level
domain.[65]
Statistics measuring a website's popularity are
usually based either on the number of page views or on associated server
'hits' (file requests) that it receives. Speed issues
Frustration
over congestion issues in the Internet infrastructure and the high
latency that results in slow browsing has led to a pejorative name for
the World Wide Web: the World Wide Wait.[66] Speeding up the Internet is
an ongoing discussion over the use of peering and QoS technologies.
Other solutions to reduce the congestion can be found at W3C.[67]
Guidelines for web response times are:[68]
* 0.1 second (one tenth of a second). Ideal response time. The user does not sense any interruption.
* 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
* 10 seconds. Unacceptable response time. The user experience is
interrupted and the user is likely to leave the site or system.
[edit] Caching
Main article: Web cache
If
a user revisits a web page after only a short interval, the page data
may not need to be re-obtained from the source web server. Almost all
web browsers cache recently obtained data, usually on the local hard
drive. HTTP requests sent by a browser will usually ask only for data
that has changed since the last download. If the locally cached data are
still current, they will be reused. Caching helps reduce the amount of
web traffic on the Internet. The decision about expiration is made
independently for each downloaded file, whether image, stylesheet,
JavaScript, HTML, or other web resource. Thus even on sites with highly
dynamic content, many of the basic resources need to be refreshed only
occasionally. Web site designers find it worthwhile to collate resources
such as CSS data and JavaScript into a few site-wide files so that they
can be cached efficiently. This helps reduce page download times and
lowers demands on the Web server.
There are other components of
the Internet that can cache web content. Corporate and academic
firewalls often cache Web resources requested by one user for the
benefit of all. (See also caching proxy server.) Some search engines
also store cached content from websites. Apart from the facilities built
into web servers that can determine when files have been updated and so
need to be re-sent, designers of dynamically generated web pages can
control the HTTP headers sent back to requesting users, so that
transient or sensitive pages are not cached. Internet banking and news
sites frequently use this facility. Data requested with an HTTP 'GET' is
likely to be cached if other conditions are met; data obtained in
response to a 'POST' is assumed to depend on the data that was POSTed
and so is not cached.
Browser
Browse, browser or browsing may refer to:
* Browse, a kind of orienting strategy in animals and human beings
* Browsing (herbivory), a type of feeding behavior in herbivores
* Web browser, used to access the World Wide Web
* File browser, also known as a file manager, used to manage files and related objects
* Help browser, for reading online help
* Code browser, for navigating source code
* Browser service, a feature of Microsoft Windows to let users browse and locate shared resources in neighboring computers
Cara Aman Berselancar di Internet
Banyak
penjahat di dunia internet ini, dan mereka selalu berusaha mencari
kelengahan kita sewaktu sedang surfing di internet, apalagi pada saat
ini bisnis di dunia internet sangat menjanjikan. Oleh karena itu ke
hati-hatian sangat diutamakan jangan sampai para penyusup masuk ke
system dan mengobrak-abriknya.
Berikut ini ada beberapa tips agar terhindar dari tangan tangan jahil di dunia maya.
1. Gunakan Favorites atau Bookmarks
Pengguanaan
Favorites atau Bookmarks ini dimaksudkan untuk menjamin website yang
dimasuki adalah benar-benar website bisnis internet yang telah diikuti,
sebab banyak upaya pencurian username dan password dengan cara membuat
website palsu yang sama persis dengan aslinya, dengan URL yang mirip
dengan aslinya. Jika dalam melakukan aktifitas menemukan kejanggalan
yaitu tampilan halaman yang berubah dan koneksi terputus lalu muncul
halaman yang meminta memasukkan username dan password,
2. Gunakan Antivirus
Pastikan
pada komputer sudah terinstal Antivirus, gunakan Antirus profesional
seperti Norton Antivirus, McAfee Antivirus, Kaspersky, F-Secure dan
antivirus buatan vendor yang sudah berlisensi. Penggunaan antivirus akan
sangat membantu untuk mengantisipasi masuknya virus ke PC. Update
antivirus juga sangat bermanfaat untuk menangani jika terdapat virus
baru yang beredar.
3. Gunakan anti Spyware dan anti Adware
Selain
Virus ada yang harus diwaspadai yaitu Spyware dan Adware, Spyware
adalah sebuah program kecil yang masuk ke komputer kita dengan tujuan
memata-matai kegiatan berinternet kita dan mencuri semua data penting
termasuk username dan password, Adware juga begitu tetapi lebih pada
tujuan promosi yang akan memunculkan jendela/pop-up di komputer kita
ketika sedang browsing, biasanya berupa iklan website porno.
4. Gunakan Firewall
Untuk
lebih mengoptimalkan pertahanan komputer maka gunakanlah firewall,
untuk Windows XP dan Vista bisa menggunakan firewall standar yang ada,
saat ini ada beberapa firewall yang cukup mumpuni untuk mencegah para
penyusup, seperti Comodo Firewal, Zone Alarm, ataupun mengaktifkan
Fireall bawaan Windows.
5. Gunakan Internet Browser yang lebih baik
Daripada
menggunakan Internet Explorer bawaan WIndows, lebih baik menggunakan
Browser alternatif yang lebih aman dan mempunyai proteksi terhadap
hacker yang lebih canggih.Saat ini beberapa penyedia browser yang selalu
bersaing memberikan yang terbaik bagi user, seperti Mozila Firefox,
Opera, Google Chrome dan lain-lain.
6. Hilangkan Jejak
Windows
dan browser biasanya akan menyimpan file-file cookies, history atau
catatan aktivitas user ketika berinternet, ini merupakan sumber
informasi bagi para hacker untuk mengetahui kegiatan user dan juga
mencuri username dan password yang telah digunakan dalam berinternet,
selain itu hacker juga biasa mengikut sertakan file-file pencuri data
mereka di folder-folder yang menyimpan cookies dan history ini di
komputer .(Cookies = file yang masuk ke komputer ketika kita mengunjungi
sebuah website.
History = Daftar kegiatan kita ketika berinternet
yang disimpan oleh browser yang kita gunakan). Selalu hapus semua jejak
berinternet agar para hacker tidak bisa menyusup ke komputer.
7. Ganti password sesering mungkin
Yang
paling penting adalah mengganti password yang digunakan sesering
mungkin, sebab secanggih apapun para hacker dapat mencuri username dan
password tidak akan berguna. jika password sudah berubah ketika para
hacker itu berusaha masuk ke website bisnis internet yang diikuti.
8. Buat password yang sukar ditebak
Jangat
buat password yang berisikan tanggal lahir, nama keluarga, nama biatang
peliharaan , atau menggunakan kalimat pendek dan umum digunakan
sehari-hari. Buatlah password sepanjang mungkin semaksimal mungkin yang
diperbolehkan, buat kombinasi antara huruf besar dan huruf kecil dan
gunakan karakter spesial seperti ? > ) / & % $, dan yang paling
penting jangan simpan catatan password di komputer dalam bentuk file,
buatlah catatan pada selembar kertas dan taruh di tempat yang aman di
sisi komputer , buatlah seolah-olah itu bukan catatan password, jangan
simpan di dompet, jika dompet hilang maka akan kesulitan nantinya.
9. Jangan terkecoh e-mail palsu
Jika
mendapatkankan email yang seakan-akan dari pengelola website bisnis
internet atau e-gold yang ikuti dan meminta untuk mengirimkan username
dan password , jangan hiraukan dan segera hapus email tersebut, jangan
klik link apapun yang ada dan jangan buka attachment yang disertakan,
pihak pengelola bisnis internet dan e-gold tidak pernah mengirim email
semacam itu.
Domain Name System
Internet protocol suite
Application layer
* DHCP
* DHCPv6
* DNS
* FTP
* HTTP
* IMAP
* IRC
* LDAP
* MGCP
* NNTP
* BGP
* NTP
* POP
* RPC
* RTP
* RTSP
* RIP
* SIP
* SMTP
* SNMP
* SOCKS
* SSH
* Telnet
* TLS/SSL
* XMPP
* (more)
Transport layer
* TCP
* UDP
* DCCP
* SCTP
* RSVP
* (more)
Internet layer
* IP
o IPv4
o IPv6
* ICMP
* ICMPv6
* ECN
* IGMP
* IPsec
* (more)
Link layer
* ARP/InARP
* NDP
* OSPF
* Tunnels
o L2TP
* PPP
* Media access control
o Ethernet
o DSL
o ISDN
o FDDI
* (more)
* v
* t
* e
The
Domain Name System (DNS) is a hierarchical distributed naming system
for computers, services, or any resource connected to the Internet or a
private network. It associates various information with domain names
assigned to each of the participating entities. Most prominently, it
translates easily memorised domain names to the numerical IP addresses
needed for the purpose of locating computer services and devices
worldwide. By providing a worldwide, distributed keyword-based
redirection service, the Domain Name System is an essential component of
the functionality of the Internet.
An often-used analogy to
explain the Domain Name System is that it serves as the phone book for
the Internet by translating human-friendly computer hostnames into IP
addresses. For example, the domain name www.example.com translates to
the addresses 192.0.43.10 (IPv4) and 2001:500:88:200::10 (IPv6). Unlike a
phone book, the DNS can be quickly updated, allowing a service's
location on the network to change without affecting the end users, who
continue to use the same host name. Users take advantage of this when
they use meaningful Uniform Resource Locators (URLs) and e-mail
addresses without having to know how the computer actually locates the
services.
The Domain Name System distributes the responsibility
of assigning domain names and mapping those names to IP addresses by
designating authoritative name servers for each domain. Authoritative
name servers are assigned to be responsible for their particular
domains, and in turn can assign other authoritative name servers for
their sub-domains. This mechanism has made the DNS distributed and fault
tolerant and has helped avoid the need for a single central register to
be continually consulted and updated. Additionally, the responsibility
for maintaining and updating the master record for the domains is spread
among many domain name registrars, who compete for the end-user's (the
domain-owner's) business. Domains can be moved from registrar to
registrar at any time.
The Domain Name System also specifies the
technical functionality of this database service. It defines the DNS
protocol, a detailed specification of the data structures and data
communication exchanges used in DNS, as part of the Internet Protocol
Suite.
The Internet maintains two principal namespaces, the
domain name hierarchy[1] and the Internet Protocol (IP) address
spaces.[2] The Domain Name System maintains the domain name hierarchy
and provides translation services between it and the address spaces.
Internet name servers and a communication protocol implement the Domain
Name System.[3] A DNS name server is a server that stores the DNS
records for a domain name, such as address (A or AAAA) records, name
server (NS) records, and mail exchanger (MX) records (see also list of
DNS record types); a DNS name server responds with answers to queries
against its database.
History
The practice of using a name
as a simpler, more memorable abstraction of a host's numerical address
on a network dates back to the ARPANET era. Before the DNS was invented
in 1982, each computer on the network retrieved a file called HOSTS.TXT
from a computer at SRI (now SRI International).[4][5] The HOSTS.TXT file
mapped names to numerical addresses. A hosts file still exists on most
modern operating systems by default and generally contains a mapping of
"localhost" to the IP address 127.0.0.1. Many operating systems use name
resolution logic that allows the administrator to configure selection
priorities for available name resolution methods.
The rapid
growth of the network made a centrally maintained, hand-crafted
HOSTS.TXT file unsustainable; it became necessary to implement a more
scalable system capable of automatically disseminating the requisite
information.
At the request of Jon Postel, Paul Mockapetris
invented the Domain Name System in 1983 and wrote the first
implementation. The original specifications were published by the
Internet Engineering Task Force in RFC 882 and RFC 883, which were
superseded in November 1987 by RFC 1034[1] and RFC 1035.[3] Several
additional Request for Comments have proposed various extensions to the
core DNS protocols.
In 1984, four Berkeley students—Douglas
Terry, Mark Painter, David Riggle, and Songnian Zhou—wrote the first
Unix name server implementation, called The Berkeley Internet Name
Domain (BIND) Server.[6] In 1985, Kevin Dunlap of DEC significantly
re-wrote the DNS implementation. Mike Karels, Phil Almquist, and Paul
Vixie have maintained BIND since then. BIND was ported to the Windows NT
platform in the early 1990s.
BIND was widely distributed,
especially on Unix systems, and is the dominant DNS software in use on
the Internet.[7] Alternative name servers have been developed, partly
motivated by a desire to improve upon BIND's record of vulnerability to
attack. BIND version 9 is also written from scratch and has a security
record comparable to other modern DNS software.[citation needed]
Structure
Domain name space
The
domain name space consists of a tree of domain names. Each node or leaf
in the tree has zero or more resource records, which hold information
associated with the domain name. The tree sub-divides into zones
beginning at the root zone. A DNS zone may consist of only one domain,
or may consist of many domains and sub-domains, depending on the
administrative authority delegated to the manager.
The hierarchical Domain Name System, organized into zones, each served by a name server
Administrative
responsibility over any zone may be divided by creating additional
zones. Authority is said to be delegated for a portion of the old space,
usually in the form of sub-domains, to another name server and
administrative entity. The old zone ceases to be authoritative for the
new zone.
Domain name syntax
The definitive descriptions of
the rules for forming domain names appear in RFC 1035, RFC 1123, and RFC
2181. A domain name consists of one or more parts, technically called
labels, that are conventionally concatenated, and delimited by dots,
such as example.com.
* The right-most label conveys the
top-level domain; for example, the domain name www.example.com belongs
to the top-level domain com.
* The hierarchy of domains descends
from right to left; each label to the left specifies a subdivision, or
subdomain of the domain to the right. For example: the label example
specifies a subdomain of the com domain, and www is a sub domain of
example.com. This tree of subdivisions may have up to 127 levels.
* Each label may contain up to 63 characters. The full domain name may
not exceed the length of 253 characters in its textual
representation.[1] In the internal binary representation of the DNS the
maximum length requires 255 octets of storage, since it also stores the
length of the name.[3] In practice, some domain registries may have
shorter limits.[citation needed]
* DNS names may technically
consist of any character representable in an octet. However, the allowed
formulation of domain names in the DNS root zone, and most other sub
domains, uses a preferred format and character set. The characters
allowed in a label are a subset of the ASCII character set, and includes
the characters a through z, A through Z, digits 0 through 9, and the
hyphen. This rule is known as the LDH rule (letters, digits, hyphen).
Domain names are interpreted in case-independent manner.[8] Labels may
not start or end with a hyphen.[9] There is an additional rule that
essentially requires that top-level domain names not be all-numeric.[10]
* A hostname is a domain name that has at least one IP address
associated. For example, the domain names www.example.com and
example.com are also hostnames, whereas the com domain is not.
Internationalized domain names
The
limited set of ASCII characters permitted in the DNS prevented the
representation of names and words of many languages in their native
alphabets or scripts. To make this possible, ICANN approved the
Internationalizing Domain Names in Applications (IDNA) system, by which
user applications, such as web browsers, map Unicode strings into the
valid DNS character set using Punycode. In 2009 ICANN approved the
installation of internationalized domain name country code top-level
domains. In addition, many registries of the existing top level domain
names (TLD)s have adopted the IDNA system.
Name servers
Main article: Name server
The
Domain Name System is maintained by a distributed database system,
which uses the client-server model. The nodes of this database are the
name servers. Each domain has at least one authoritative DNS server that
publishes information about that domain and the name servers of any
domains subordinate to it. The top of the hierarchy is served by the
root name servers, the servers to query when looking up (resolving) a
TLD.
Authoritative name server
An authoritative name server is
a name server that gives answers that have been configured by an
original source, for example, the domain administrator or by dynamic DNS
methods, in contrast to answers that were obtained via a regular DNS
query to another name server. An authoritative-only name server only
returns answers to queries about domain names that have been
specifically configured by the administrator.
In other words, an
authoritative name server lets recursive name servers know what DNS data
(the IPv4 IP, the IPv6 IP, a list of incoming mail servers, etc.) a
given host name (such as "www.example.com") has. As just one example,
the authoritative name server for "example.com" tells recursive name
servers that "www.example.com" has the IPv4 IP 192.0.43.10.
An
authoritative name server can either be a master server or a slave
server. A master server is a server that stores the original (master)
copies of all zone records. A slave server uses an automatic updating
mechanism of the DNS protocol in communication with its master to
maintain an identical copy of the master records.
A set of
authoritative name servers has to be assigned for every DNS zone. An NS
record about addresses of that set must be stored in the parent zone and
servers themselves (as self-reference).
When domain names are
registered with a domain name registrar, their installation at the
domain registry of a top level domain requires the assignment of a
primary name server and at least one secondary name server. The
requirement of multiple name servers aims to make the domain still
functional even if one name server becomes inaccessible or
inoperable.[11] The designation of a primary name server is solely
determined by the priority given to the domain name registrar. For this
purpose, generally only the fully qualified domain name of the name
server is required, unless the servers are contained in the registered
domain, in which case the corresponding IP address is needed as well.
Primary name servers are often master name servers, while secondary name servers may be implemented as slave servers.
An
authoritative server indicates its status of supplying definitive
answers, deemed authoritative, by setting a software flag (a protocol
structure bit), called the Authoritative Answer (AA) bit in its
responses.[3] This flag is usually reproduced prominently in the output
of DNS administration query tools (such as dig) to indicate that the
responding name server is an authority for the domain name in
question.[3]
Operation
Address resolution mechanism
Domain
name resolvers determine the appropriate domain name servers responsible
for the domain name in question by a sequence of queries starting with
the right-most (top-level) domain label.
A DNS recursor consults three name servers to resolve the address www.wikipedia.org.
The process entails:
1. A network host is configured with an initial cache (so called hints)
of the known addresses of the root name servers. Such a hint file is
updated periodically by an administrator from a reliable source.
2. A query to one of the root servers to find the server authoritative for the top-level domain.
3. A query to the obtained TLD server for the address of a DNS server authoritative for the second-level domain.
4. Repetition of the previous step to process each domain name label in
sequence, until the final step which returns the IP address of the host
sought.
The diagram illustrates this process for the host www.wikipedia.org.
The
mechanism in this simple form would place a large operating burden on
the root servers, with every search for an address starting by querying
one of them. Being as critical as they are to the overall function of
the system, such heavy use would create an insurmountable bottleneck for
trillions of queries placed every day. In practice caching is used in
DNS servers to overcome this problem, and as a result, root name servers
actually are involved with very little of the total traffic.
Recursive and caching name server
In
theory, authoritative name servers are sufficient for the operation of
the Internet. However, with only authoritative name servers operating,
every DNS query must start with recursive queries at the root zone of
the Domain Name System and each user system would have to implement
resolver software capable of recursive operation.
To improve
efficiency, reduce DNS traffic across the Internet, and increase
performance in end-user applications, the Domain Name System supports
DNS cache servers which store DNS query results for a period of time
determined in the configuration (time-to-live) of the domain name record
in question. Typically, such caching DNS servers, also called DNS
caches, also implement the recursive algorithm necessary to resolve a
given name starting with the DNS root through to the authoritative name
servers of the queried domain. With this function implemented in the
name server, user applications gain efficiency in design and operation.
As
one example, if a client wants to know the IP for "www.example.com", it
will send, to a recursive caching name server, a DNS request stating "I
would like the IPv4 IP for 'www.example.com'." The recursive name
server will then query authoritative name servers until it gets an
answer to that query (or return an error if it's not possible to get an
answer)--in this case 192.0.43.10.
The combination of DNS caching
and recursive functions in a name server is not mandatory; the
functions can be implemented independently in servers for special
purposes.
Internet service providers (ISPs) typically provide
recursive and caching name servers for their customers. In addition,
many home networking routers implement DNS caches and recursors to
improve efficiency in the local network.
DNS resolvers
See also: resolv.conf
The
client-side of the DNS is called a DNS resolver. It is responsible for
initiating and sequencing the queries that ultimately lead to a full
resolution (translation) of the resource sought, e.g., translation of a
domain name into an IP address.
A DNS query may be either a non-recursive query or a recursive query:
* A non-recursive query is one in which the DNS server provides a
record for a domain for which it is authoritative itself, or it provides
a partial result without querying other servers.
* A recursive
query is one for which the DNS server will fully answer the query (or
give an error) by querying other name servers as needed. DNS servers are
not required to support recursive queries.
The resolver, or
another DNS server acting recursively on behalf of the resolver,
negotiates use of recursive service using bits in the query headers.
Resolving
usually entails iterating through several name servers to find the
needed information. However, some resolvers function more simply by
communicating only with a single name server. These simple resolvers
(called "stub resolvers") rely on a recursive name server to perform the
work of finding information for them.
Circular dependencies and glue records
Name
servers in delegations are identified by name, rather than by IP
address. This means that a resolving name server must issue another DNS
request to find out the IP address of the server to which it has been
referred. If the name given in the delegation is a subdomain of the
domain for which the delegation is being provided, there is a circular
dependency. In this case the name server providing the delegation must
also provide one or more IP addresses for the authoritative name server
mentioned in the delegation. This information is called glue. The
delegating name server provides this glue in the form of records in the
additional section of the DNS response, and provides the delegation in
the answer section of the response.
For example, if the
authoritative name server for example.org is ns1.example.org, a computer
trying to resolve www.example.org first resolves ns1.example.org. Since
ns1 is contained in example.org, this requires resolving example.org
first, which presents a circular dependency. To break the dependency,
the name server for the org top level domain includes glue along with
the delegation for example.org. The glue records are address records
that provide IP addresses for ns1.example.org. The resolver uses one or
more of these IP addresses to query one of the domain's authoritative
servers, which allows it to complete the DNS query.
Record caching
The
DNS Resolution Process reduces the load on individual servers by
caching DNS request records for a period of time after a response. This
entails the local recording and subsequent consultation of the copy
instead of initiating a new request upstream. The time for which a
resolver caches a DNS response is determined by a value called the time
to live (TTL) associated with every record. The TTL is set by the
administrator of the DNS server handing out the authoritative response.
The period of validity may vary from just seconds to days or even weeks.
As
a noteworthy consequence of this distributed and caching architecture,
changes to DNS records do not propagate throughout the network
immediately, but require all caches to expire and refresh after the TTL.
RFC 1912 conveys basic rules for determining appropriate TTL values.
Some
resolvers may override TTL values, as the protocol supports caching for
up to 68 years or no caching at all. Negative caching, i.e. the caching
of the fact of non-existence of a record, is determined by name servers
authoritative for a zone which must include the Start of Authority
(SOA) record when reporting no data of the requested type exists. The
value of the MINIMUM field of the SOA record and the TTL of the SOA
itself is used to establish the TTL for the negative answer.
Reverse lookup
A
reverse lookup is a query of the DNS for domain names when the IP
address is known. Multiple domain names may be associated with an IP
address. The DNS stores IP addresses in the form of domain names as
specially formatted names in pointer (PTR) records within the
infrastructure top-level domain arpa. For IPv4, the domain is
in-addr.arpa. For IPv6, the reverse lookup domain is ip6.arpa. The IP
address is represented as a name in reverse-ordered octet representation
for IPv4, and reverse-ordered nibble representation for IPv6.
When
performing a reverse lookup, the DNS client converts the address into
these formats, and then queries the name for a PTR record following the
delegation chain as for any DNS query. For example, assume the IPv4
address 208.80.152.2 is assigned to Wikimedia. It is represented as a
DNS name in reverse order like this: 2.152.80.208.in-addr.arpa. When the
DNS resolver gets a PTR (reverse-lookup) request, it begins by querying
the root servers (which point to The American Registry For Internet
Numbers' (ARIN's) servers for the 208.in-addr.arpa zone). On ARIN's
servers, 152.80.208.in-addr.arpa is assigned to Wikimedia, so the
resolver sends another query to the Wikimedia name server for
2.152.80.208.in-addr.arpa, which results in an authoritative response.
Client lookup
DNS resolution sequence
Users
generally do not communicate directly with a DNS resolver. Instead DNS
resolution takes place transparently in applications such as web
browsers, e-mail clients, and other Internet applications. When an
application makes a request that requires a domain name lookup, such
programs send a resolution request to the DNS resolver in the local
operating system, which in turn handles the communications required.
The
DNS resolver will almost invariably have a cache (see above) containing
recent lookups. If the cache can provide the answer to the request, the
resolver will return the value in the cache to the program that made
the request. If the cache does not contain the answer, the resolver will
send the request to one or more designated DNS servers. In the case of
most home users, the Internet service provider to which the machine
connects will usually supply this DNS server: such a user will either
have configured that server's address manually or allowed DHCP to set
it; however, where systems administrators have configured systems to use
their own DNS servers, their DNS resolvers point to separately
maintained name servers of the organization. In any event, the name
server thus queried will follow the process outlined above, until it
either successfully finds a result or does not. It then returns its
results to the DNS resolver; assuming it has found a result, the
resolver duly caches that result for future use, and hands the result
back to the software which initiated the request.
Broken resolvers
An
additional level of complexity emerges when resolvers violate the rules
of the DNS protocol. A number of large ISPs have configured their DNS
servers to violate rules (presumably to allow them to run on
less-expensive hardware than a fully compliant resolver), such as by
disobeying TTLs, or by indicating that a domain name does not exist just
because one of its name servers does not respond.[12]
As a final
level of complexity, some applications (such as web browsers) also have
their own DNS cache, in order to reduce the use of the DNS resolver
library itself. This practice can add extra difficulty when debugging
DNS issues, as it obscures the freshness of data, and/or what data comes
from which cache. These caches typically use very short caching
times—on the order of one minute.[13]
Internet Explorer
represents a notable exception: versions up to IE 3.x cache DNS records
for 24 hours by default. Internet Explorer 4.x and later versions (up to
IE 8) decrease the default time out value to half an hour, which may be
changed in corresponding registry keys.[14]
Other applications
The system outlined above provides a somewhat simplified scenario. The Domain Name System includes several other functions:
* Hostnames and IP addresses do not necessarily match on a one-to-one
basis. Multiple hostnames may correspond to a single IP address:
combined with virtual hosting, this allows a single machine to serve
many web sites. Alternatively, a single hostname may correspond to many
IP addresses: this can facilitate fault tolerance and load distribution,
and also allows a site to move physical locations seamlessly.
*
There are many uses of DNS besides translating names to IP addresses.
For instance, Mail transfer agents use DNS to find out where to deliver
e-mail for a particular address. The domain to mail exchanger mapping
provided by MX records accommodates another layer of fault tolerance and
load distribution on top of the name to IP address mapping.
*
E-mail Blacklists: The DNS is used for efficient storage and
distribution of IP addresses of blacklisted e-mail hosts. The usual
method is putting the IP address of the subject host into the sub-domain
of a higher level domain name, and resolve that name to different
records to indicate a positive or a negative. Here is a hypothetical
example blacklist:
o 102.3.4.5 is blacklisted => Creates 5.4.3.102.blacklist.example and resolves to 127.0.0.1
o 102.3.4.6 is not => 6.4.3.102.blacklist.example is not found, or default to 127.0.0.2
o E-mail servers can then query blacklist.example through the DNS
mechanism to find out if a specific host connecting to them is in the
blacklist. Today many of such blacklists, either free or
subscription-based, are available mainly for use by email administrators
and anti-spam software.
* Sender Policy Framework and
DomainKeys, instead of creating their own record types, were designed to
take advantage of another DNS record type, the TXT record.
* To
provide resilience in the event of computer failure, multiple DNS
servers are usually provided for coverage of each domain, and at the top
level, thirteen very powerful root name servers exist, with additional
"copies" of several of them distributed worldwide via Anycast.
*
Dynamic DNS (sometimes called DDNS) allows clients to update their DNS
entry as their IP address changes, as it does, for example, when moving
between ISPs or mobile hot spots.
Protocol details
DNS
primarily uses User Datagram Protocol (UDP) on port number 53 to serve
requests.[3] DNS queries consist of a single UDP request from the client
followed by a single UDP reply from the server. The Transmission
Control Protocol (TCP) is used when the response data size exceeds 512
bytes, or for tasks such as zone transfers. Some resolver
implementations use TCP for all queries.
DNS resource records
Further information: List of DNS record types
A
Resource Record (RR) is the basic data element in the domain name
system. Each record has a type (A, MX, etc.), an expiration time limit, a
class, and some type-specific data. Resource records of the same type
define a resource record set (RRset). The order of resource records in a
set, returned by a resolver to an application, is undefined, but often
servers implement round-robin ordering to achieve Global Server Load
Balancing. DNSSEC, however, works on complete resource record sets in a
canonical order.
When sent over an IP network, all records use the common format specified in RFC 1035:[15]
RR (Resource record) fields Field Description Length (octets)
NAME Name of the node to which this record pertains (variable)
TYPE Type of RR in numeric form (e.g. 15 for MX RRs) 2
CLASS Class code 2
TTL Count of seconds that the RR stays valid (The maximum is 231-1, which is about 68 years) 4
RDLENGTH Length of RDATA field 2
RDATA Additional RR-specific data (variable)
NAME
is the fully qualified domain name of the node in the tree. On the
wire, the name may be shortened using label compression where ends of
domain names mentioned earlier in the packet can be substituted for the
end of the current domain name. A free standing @ is used to denote the
current origin.
TYPE is the record type. It indicates the format
of the data and it gives a hint of its intended use. For example, the A
record is used to translate from a domain name to an IPv4 address, the
NS record lists which name servers can answer lookups on a DNS zone, and
the MX record specifies the mail server used to handle mail for a
domain specified in an e-mail address (see also List of DNS record
types).
RDATA is data of type-specific relevance, such as the IP
address for address records, or the priority and hostname for MX
records. Well known record types may use label compression in the RDATA
field, but "unknown" record types must not (RFC 3597).
The CLASS
of a record is set to IN (for Internet) for common DNS records involving
Internet hostnames, servers, or IP addresses. In addition, the classes
Chaos (CH) and Hesiod (HS) exist.[16] Each class is an independent name
space with potentially different delegations of DNS zones.
In
addition to resource records defined in a zone file, the domain name
system also defines several request types that are used only in
communication with other DNS nodes (on the wire), such as when
performing zone transfers (AXFR/IXFR) or for EDNS (OPT).
Wildcard DNS records
Main article: Wildcard DNS record
The
domain name system supports wildcard domain names which are names that
start with the asterisk label, '*', e.g., *.example.[1][17] DNS records
belonging to wildcard domain names specify rules for generating resource
records within a single DNS zone by substituting whole labels with
matching components of the query name, including any specified
descendants. For example, in the DNS zone x.example, the following
configuration specifies that all subdomains (including subdomains of
subdomains) of x.example use the mail exchanger a.x.example. The records
for a.x.example are needed to specify the mail exchanger. As this has
the result of excluding this domain name and its subdomains from the
wildcard matches, all subdomains of a.x.example must be defined in a
separate wildcard statement.
The role of wildcard records was
refined in RFC 4592, because the original definition in RFC 1034 was
incomplete and resulted in misinterpretations by implementers.[17]
Protocol extensions
The
original DNS protocol had limited provisions for extension with new
features. In 1999, Paul Vixie published in RFC 2671 an extension
mechanism, called Extension mechanisms for DNS (EDNS) that introduced
optional protocol elements without increasing overhead when not in use.
This was accomplished through the OPT pseudo-resource record that only
exists in wire transmissions of the protocol, but not in any zone files.
Initial extensions were also suggested (EDNS0), such as increasing the
DNS message size in UDP datagrams.
Dynamic zone updates
Dynamic
DNS updates use the UPDATE DNS opcode to add or remove resource records
dynamically from a zone data base maintained on an authoritative DNS
server. The feature is described in RFC 2136. This facility is useful to
register network clients into the DNS when they boot or become
otherwise available on the network. Since a booting client may be
assigned a different IP address each time from a DHCP server, it is not
possible to provide static DNS assignments for such clients.
Security issues
Originally,
security concerns were not major design considerations for DNS software
or any software for deployment on the early Internet, as the network
was not open for participation by the general public. However, the
expansion of the Internet into the commercial sector in the 1990s
changed the requirements for security measures to protect data integrity
and user authentication.
Several vulnerability issues were
discovered and exploited by malicious users. One such issue is DNS cache
poisoning, in which data is distributed to caching resolvers under the
pretense of being an authoritative origin server, thereby polluting the
data store with potentially false information and long expiration times
(time-to-live). Subsequently, legitimate application requests may be
redirected to network hosts operated with malicious intent.
DNS
responses are traditionally not cryptographically signed, leading to
many attack possibilities; the Domain Name System Security Extensions
(DNSSEC) modify DNS to add support for cryptographically signed
responses. Several extensions have been devised to secure zone transfers
as well.
Some domain names may be used to achieve spoofing
effects. For example, paypal.com and paypa1.com are different names, yet
users may be unable to distinguish them in a graphical user interface
depending on the user's chosen typeface. In many fonts the letter l and
the numeral 1 look very similar or even identical. This problem is acute
in systems that support internationalized domain names, since many
character codes in ISO 10646, may appear identical on typical computer
screens. This vulnerability is occasionally exploited in phishing.[18]
Techniques such as forward-confirmed reverse DNS can also be used to help validate DNS results.
Domain name registration
The
right to use a domain name is delegated by domain name registrars which
are accredited by the Internet Corporation for Assigned Names and
Numbers (ICANN), the organization charged with overseeing the name and
number systems of the Internet. In addition to ICANN, each top-level
domain (TLD) is maintained and serviced technically by an administrative
organization, operating a registry. A registry is responsible for
maintaining the database of names registered within the TLD it
administers. The registry receives registration information from each
domain name registrar authorized to assign names in the corresponding
TLD and publishes the information using a special service, the WHOIS
protocol.
ICANN publishes the complete list of TLD registries and
domain name registrars. Registrant information associated with domain
names is maintained in an online database accessible with the WHOIS
service. For most of the more than 290 country code top-level domains
(ccTLDs), the domain registries maintain the WHOIS (Registrant, name
servers, expiration dates, etc.) information. For instance, DENIC,
Germany NIC, holds the DE domain data. Since about 2001, most gTLD
registries have adopted this so-called thick registry approach, i.e.
keeping the WHOIS data in central registries instead of registrar
databases.
For COM and NET domain names, a thin registry model is
used. The domain registry (e.g., VeriSign) holds basic WHOIS data
(i.e., registrar and name servers, etc.) One can find the detailed WHOIS
(registrant, name servers, expiry dates, etc.) at the registrars.
Some
domain name registries, often called network information centers (NIC),
also function as registrars to end-users. The major generic top-level
domain registries, such as for the COM, NET, ORG, INFO domains, use a
registry-registrar model consisting of many domain name
registrars.[19][20] In this method of management, the registry only
manages the domain name database and the relationship with the
registrars. The registrants (users of a domain name) are customers of
the registrar, in some cases through additional layers of resellers.