The abbreviation of the World Wide Web. The most interesting facts about the world wide web and the Internet. Search engines: composition, functions, principles of work

THE WORLD WIDE WEB [eng. World Wide web, abbr. - WWW or Web (web)] - distributed heterogeneouscomputer shared systemhypermedia documents operating on a networkthe Internet ... The documents linked by hyperlinks of this system, which form a complex branched structure, are figuratively called "web" (web). One of the most popular Internet services (along withby email , search engines and etc.).

Web pages and websites

The Web's hypermedia documents, called web pages, are an evolution of the hypertext documents (see Hypertext). They can contain text, images, audio, video and other components. Each web page has a unique address - the URL (abbreviated as Universal Resource Locator - Uniform Resource Locator) where it can be found. Eg www.webopedia.com is the URL of the main web page of an electronic computer dictionary. Any collection of contiguous symbols on a web page can be a hyperlink to another web page or others. information resource The Internet. Hyperlinks can point not only to documents on the Web, but also to other information resources provided by Internet services. A collection of hyperlinked web pages that share a common part of the URL (and are usually grouped together thematically) is called a website. For example, the Webopedia electronic computer dictionary site is www.webopedia.com, and the web page of the article defining the term Web is http://www.webopedia.com/TERM/W/World_Wide_Web.html. The common part of the address of all pages of the site is the site address.

Web servers and browsers

Websites are stored on web servers (computers with special. software). Computers can be web servers. architectures that meet the requirements for reliability and performance. They can work under the control of different operating systems and use a variety of server programs. Therefore, the web is heterogeneous computer system... The global distribution of the Web is expressed in the fact that web servers can be located anywhere in the world where there is an opportunity to connect to the Internet.

To obtain web pages, the user uses a browser (web client program). Using a browser, he forms and sends requests to receive information resources of interest to him. The browser interprets the response messages received from the web servers and displays the interpretation results on a device as part of a computer or a user's computer device (computer display, smartphone, etc.). With the growing number of different web applications (web mail, search, various photo, graphic and text web editors, as well as others. application programs) browsers began to be used as platforms for web applications. To work with any of the web applications, the user only needs to have an appropriate browser, know the address of the application and have the right to access. As the number of web applications grows, browsers are taking on an increasing proportion of the operating system's tasks (e.g. Google chrome OS is based on google browser Chrome).

The interpretation of messages from browsers is performed by server programs installed on the web servers. The exchange of messages between these programs is carried out according to the rules (protocols) corresponding to the client-server architecture (see. Computer network). The Web has a hypertext transfer protocol (HTTP).

The invention of the web and the implementation of the project

In March 1989, the British physicist Tim Berners-Lee proposed a project later called the Web project [at the time the inventor of the Web was an employee of the European Organization for Nuclear Research (CERN)]. On the Web project, his first assistant was systems engineer Robert Cailliau. In the present. while the inventor of the Web is the director of the World Wide Web Consortium (W3C). The goal of the W3C is to increase the potential of the Web (through the development of better protocols and technologies) and ensure its continued sustainable growth.

From the end. 1993 (after the creation of the first Mosaic browser with a graphical interface), the popularity of the Web began to grow rapidly.

By the beginning. 21 c. The web has become the most popular and fastest growing service on the Internet. Ease of interaction with the Web and thematic. the variety of web resources determined the scale of its application in various fields of human activity (distance learning, e-commerce, social networks, electronic editions, etc.).

Today the number of Internet users reaches 3.5 billion people, which is almost half of the world's population. And, of course, everyone knows that The World Wide Web has finally enveloped our planet... But until now, not everyone can say whether there is a difference between the concepts of the Internet and the World Wide Web. Oddly enough, many are absolutely sure that these are synonyms, but well-trained guys can give reasons that will diminish this confidence.

What is the Internet?

Without going into complex technical details, we can say that The Internet is a system that unites computer networks around the world... Computers are divided into two groups - clients and servers.

Clients are called ordinary user devices, which include and personal computers, and laptops, and tablets, and, of course, smartphones. They send a request, receive and display information.

All information is stored by servers, which can be classified according to different purposes:

  • web server,
  • postage,
  • chats,
  • radio and television broadcasting systems,
  • file sharing.

Servers are powerful computers that run continuously. In addition to storing information, they receive requests from customers and send the required response. At the same time, they process hundreds of such requests.

Also in our brief educational program it is necessary to mention it is worth mentioning internet providersthat provide communication between the client and the server. A provider is an organization with its own Internet server to which all of its clients are connected. Providers provide communication via telephone cable, a dedicated channel or wireless network.


This is how you get to the Internet

Is it possible to do without a provider and directly connect to the Internet? Theoretically, you can! You will have to become your own provider and spend a huge amount of money to get to the central servers. So don't scold your internet provider for high rates - these guys also need to pay for many things and spend money on equipment maintenance.

The World Wide Web has entangled the whole world

The World Wide Web or simply the web - the "web". Actually it is represented by a huge number of pages that are interconnected. This link is provided by links through which you can navigate from one page to another, even if it is located on another computer connected to.


The World Wide Web is the most popular and largest service on the Internet

The World Wide Web uses special web servers for its work. They store web pages (one of them you see now). Pages linked by links having a general theme, appearance, and usually located on the same server are called a website.

To view pages and documents of the web, special programs are used - browsers.

It is the World Wide Web that includes forums, blogs and social networks. But directly its work and existence is provided by the Internet ...

Is the difference big?

In fact, the difference between the Internet and the World Wide Web is quite large. If the Internet is a huge network that connects millions of computers around the planet for sharing information, the World Wide Web is just one way to share this information. In addition to making the World Wide Web work, the Internet allows you to use email and various instant messengers, as well as transfer files via FTP,

The Internet is what connects numerous computer networks.

The World Wide Web is all pages that are stored on special servers on the Internet.

Conclusion

Now you know that Worldwide network The Internet and the World Wide Web are different things. And most importantly, you will be able to show off your mind and explain to your friends what the difference is.

World Wide Web (abbreviated World wide web or WWW) is a unity of information resources that are interconnected by means of telecommunications and are based on a hypertext representation of data scattered around the world.

The year of birth of the World Wide Web is 1989. It was in this year that Tim Berners-Lee proposed a general hypertext project, which later became known as the World Wide Web.

The creator of the "web" Tim Bernes-Lee, working in the laboratory of particle physics of the European Center for Nuclear Research "CERN" In Geneva (Switzerland), together with partner Robert Caillaux, worked on the problems of applying the ideas of hypertext to build an information environment that would simplify the exchange of information between physicists ...

The result of this work was a document that considered concepts that are of fundamental importance to the "web" in its modern form, and proposed URIs, the HTTR protocol and the HTML language. The modern Internet cannot be imagined without these technologies.

Berners-Lee created the world's first web server and the world's first hypertext web browser. On the world's first website, he described what the World Wide Web is and how to set up a web server, how to use a browser, etc. This site was also the world's first Internet directory.

Since 1994, the most important tasks for the development of the World Wide Web have been taken over by the World Wide Web Consortium ( World Wide Web Consortium, WZS), which was organized and is still headed by Kim Bernes-Lee. The consortium develops and implements technology standards for the Internet and the World Wide Web. WLC Mission: "To fully unleash the potential of the World Wide Web by creating protocols and principles that guarantee the long-term development of the Web." WZS develops "Recommendations" to achieve compatibility between software products and hardware of different companies, which makes the World Wide Web more perfect, universal and convenient.

Search engines: composition, functions, principles of work.

Search system is a software and hardware complex designed to search the Internet and responding to a user's request, specified in the form of a text phrase (search query), by issuing a list of links to information sources, in order of relevance (in accordance with the request). Major international search engines: Google, Yahoo, "MSN"... On the Russian Internet it is - Yandex, Rambler, "Aport".

Let's describe main characteristics of search engines :

    Completeness

Completeness is one of the main characteristics of a search engine, which is the ratio of the number of documents found upon request to the total number of documents on the Internet that satisfy this request. For example, if there are 100 pages on the Internet containing the phrase “how to choose a car”, and only 60 of them were found for the corresponding query, then the search completeness will be 0.6. Obviously, the more complete the search, the less likely it is that the user will not find the document he needs, provided that it exists on the Internet at all.

    Accuracy

Accuracy is another main characteristic of a search engine, which is determined by the degree to which the found documents match the user's request. For example, if the query “how to choose a car” contains 100 documents, 50 of them contain the phrase “how to choose a car”, and the rest simply contain these words (“how to choose the right radio tape recorder and install it in a car”), then the search accuracy is considered equal to 50/100 (\u003d 0.5). The more accurate the search, the faster the user will find the documents he needs, the less various kinds of "garbage" there will be among them, the less often the documents found will not match the request.

    Relevance

Relevance is an equally important component of search, which is characterized by the time elapsing from the moment documents are published on the Internet until they are entered into the index base of the search engine. For example, the next day after the appearance of interesting news, a large number of users turned to search engines with relevant queries. Objectively, less than a day has passed since the publication of news information on this topic, but the main documents have already been indexed and are available for search, thanks to the existence of a so-called "quick base" in large search engines, which is updated several times a day.

    Search speed

Search speed is closely related to its resistance to stress. For example, according to Rambler Internet Holding LLC, today, during business hours, the Rambler search engine receives about 60 queries per second. Such workload requires a reduction in the processing time of an individual request. Here, the interests of the user and the search engine coincide: the visitor wants to get results as quickly as possible, and the search engine must process the query as quickly as possible so as not to slow down the calculation of the following queries.

    Visibility

The visibility of the results is an important component of a user-friendly search. For most queries, the search engine finds hundreds or even thousands of documents. Due to the lack of clarity in the compilation of queries or inaccuracy of the search, even the first pages of search results do not always contain only the necessary information. This means that the user often has to do their own search within the found list. Various elements of the search engine results page help you navigate the search results. Detailed explanations on the search results page, for example, from Yandex, can be found at the link http://help.yandex.ru/search/?id\u003d481937.

A brief history of the development of search engines

In the initial period of the development of the Internet, the number of its users was small, and the volume available information relatively small. For the most part, only research workers had access to the Internet. At this time, the task of finding information on the Internet was not as urgent as it is now.

One of the first ways to organize access to information resources network was the creation of open catalogs of sites, links to resources in which were grouped according to topics. The first such project was the site Yahoo.com, which opened in the spring of 1994. After the number of sites in the Yahoo directory has increased significantly, the ability to search for the desired information in the directory has been added. In the full sense, it was not yet a search engine, since the search area was limited only to the resources present in the directory, and not to all Internet resources.

Link directories were widely used in the past, but have almost completely lost their popularity today. Since even modern catalogs, huge in their volume, contain information only about an insignificant part of the Internet. The largest directory of the DMOZ network (also called the Open Directory Project) contains information on 5 million resources, while the Google search engine base consists of more than 8 billion documents.

The first full-fledged search engine was the WebCrawler project, published in 1994.

In 1995, the search engines Lycos and AltaVista appeared. The last for many years was a leader in the field of information search on the Internet.

In 1997, Sergey Brin and Larry Page created the Google search engine as part of a research project at Stanford University. Google is currently the most popular search engine in the world!

In September 1997, the Yandex search engine was officially announced, which is the most popular in the Russian-speaking Internet.

Currently, there are three main international search engines - Google, Yahoo and MSN, which have their own databases and search algorithms. Most other search engines (of which there are a large number) use the results of the three listed in one form or another. For example, AOL search (search.aol.com) uses a Google base, while AltaVista, Lycos and AllTheWeb use a Yahoo base.

The composition and principles of the search engine

In Russia, the main search engine is Yandex, then Rambler.ru, Google.ru, Aport.ru, Mail.ru. Moreover, at the moment, Mail.ru uses the mechanism and the search base of "Yandex".

Almost all major search engines have their own structure that is different from others. However, it is possible to single out the main components common to all search engines. Differences in the structure can only be in the form of the implementation of mechanisms for the interaction of these components.

Indexing module

The indexing module consists of three auxiliary programs (robots):

Spider (spider) - a program designed to download web pages. The spider downloads the page and extracts all internal links from that page. The html-code of each page is downloaded. Robots use HTTP protocols to download pages. The "spider" works as follows. The robot sends the “get / path / document” request and some other HTTP request commands to the server. In response, the robot receives a text stream containing service information and the document itself.

    Page url

    the date the page was downloaded

    server response http header

    page body (html-code)

Crawler ("traveling" spider) - a program that automatically goes through all the links found on the page. Highlights all links present on the page. Its task is to determine where the spider should go next, based on links or based on a predefined list of addresses. Crawler, following the links found, searches for new documents that are still unknown to the search engine.

Indexer (robot indexer) is a program that analyzes web pages downloaded by spiders. The indexer parses the page into its component parts and analyzes them using its own lexical and morphological algorithms. Various page elements are analyzed, such as text, headings, links, structural and style features, special service html tags, etc.

Thus, the indexing module makes it possible to crawl a given set of resources by links, download the pages encountered, extract links to new pages from the received documents and perform a complete analysis of these documents.

Database

A database, or a search engine index, is a data storage system, an information array that stores specially converted parameters of all documents downloaded and processed by the indexing module.

Search Server

The search server is an essential element of the entire system, since the quality and speed of search directly depends on the algorithms that underlie its functioning.

The search engine works as follows:

    The request received from the user is subjected to morphological analysis. The information environment of each document contained in the database is generated (which will subsequently be displayed as a snippet, that is, corresponding to the request text information on the search results page).

    The received data is passed as input parameters to a special ranging module. The processing of data for all documents takes place, as a result of which, for each document, its own rating is calculated, which characterizes the relevance of the query entered by the user and the various components of this document stored in the search engine index.

    Depending on the user's choice, this rating can be adjusted by additional conditions (for example, the so-called "advanced search").

    Next, a snippet is generated, that is, for each found document, the title, a short annotation that best matches the request and a link to the document itself are extracted from the document table, and the found words are highlighted.

    The resulting search results are transmitted to the user in the form of SERP (Search Engine Result Page) - search results page.

As you can see, all these components are closely related to each other and work in interaction, forming a clear, rather complex mechanism for the search engine operation, which requires huge resources.

No search engine covers all Internet resources.

Each search engine collects information about Internet resources using its unique methods, and forms its own periodically updated database. Access to this database is provided to the user.

Search engines implement two ways to find a resource:

    Search by thematic catalogs - informationpresented as a hierarchical structure. On upper level - general categories (“Internet”, “Business”, “Art”, “Education”, etc.), at the next level the categories are divided into sections, etc. The lowest level is links to specific web pages or other information resources.

    Keyword search (index search or detailed search) - the user submits to the search engine inquiryconsisting of keywords. System returnsto the user a list of resources found by request.

Most search engines combine both search methods.

Search engines can be local, global, regional, and specialized.

In the Russian part of the Internet (Runet) the most popular search engines are Rambler (www.rambler.ru), Yandex (www.yandex.ru), Aport (www.aport.ru), Google (www.google.ru).

Most search enginesimplemented as portals.

Portal (from the English.portal - main entrance, gate) is a website that integrates various Internet services: search tools, mail, news, dictionaries, etc.

Portals can be specialized (like,www. museum. ru) and general (for example,www. km. ru).

Keyword search

The set of keywords that are being searched for is also called a search criterion or search topic.

A query can consist of either one word or a combination of words combined by operators - symbols by which the system determines what action it needs to perform. For example: the query “Moscow Peter” contains the AND operator (this is how the space is perceived), which indicates that you need to search for documents that contain both words - Moscow and Peter.

In order for a search to be relevant, there are a few general rules to keep in mind:

    Regardless of the form in which the word is used in the query, the search takes into account all its word forms according to the rules of the Russian language. For example, the query “ticket” will also find the words “ticket”, “ticket”, etc.

    Use capital letters only in proper names so as not to look at unnecessary references. At the request of “blacksmiths”, for example, documents will be found that speak of both blacksmiths and Kuznetsovs.

    It is advisable to narrow your search by using a few keywords.

    If the required address is not among the first twenty found addresses, you should change the request.

Each search engine uses its own query language. To get acquainted with it, use the built-in help of the search engine

Large sites may have built-in search engines within their web pages.

Queries in such search engines, as a rule, are built according to the same rules as in global search engines, but familiarity with the help here will not be superfluous.

Advanced Search

Search engines can provide the user with a mechanism that allows them to form a complex query. Following a link Advanced Searchmakes it possible to edit search parameters, specify additional parameters and choose the most convenient form of displaying search results. Below are the parameters that can be set in the advanced search in the Applex and Rambler systems.

Parameter description

Name in Yandex

Name inRambler

Where to looking for keywords (document title, body text, etc.)

Dictionary filter

Text Search ...

What words should or should not be present in the document and how accurate should the match be

Dictionary filter

Search for query words ... Exclude documents containing the following words ...

How far apart should keywords be?

Dictionary filter

Distance between query words ...

Document date limitation

Document date ...

Limit your search to one or more sites

Site / Top

Search documents only on the following sites ...

Limit search by document language

Document language ...

Search for documents containing a picture with a specific name or signature

Picture

Finding Pages Containing Objects

Special objects

Search results presentation form

Issue format

Displaying search results

Some search engines (for example, Yandex) allow you to enter queries in natural language. You write what you need to find (for example: booking a train ticket from Moscow to St. Petersburg). The system analyzes the request and gives the result. If it does not suit you, switch to the query language.

To improve the visual perception of the web, CSS technology has become widely used, which allows you to set uniform styles for multiple web pages. Another innovation worth paying attention to is the URN (Uniform Resource Name) resource designation system.

A popular concept for the development of the World Wide Web is the creation of the Semantic Web. The Semantic Web is an add-on to the existing World Wide Web, which is designed to make information posted on the network more understandable for computers. The Semantic Web is a concept of a web in which every resource in human language would be provided with a description that a computer can understand. The Semantic Web provides access to well-structured information for any application, regardless of platform and regardless of programming language. Programs will be able to find the necessary resources themselves, process information, classify data, identify logical connections, draw conclusions and even make decisions based on these conclusions. If widely distributed and properly implemented, the Semantic Web can revolutionize the Internet. To create a computer-readable description of a resource, the Semantic Web uses the Resource Description Framework (RDF) format, which is based on XML syntax and uses URIs to denote resources. New items in this area are RDFS (English) Russian. (English RDF Schema) and SPARQL (English Protocol And RDF Query Language) (pronounced as "Sparkl"), a new query language for quick access to RDF data.

2.1 Addresses and protocols

To organize the interaction of computers, all of them must have unique numeric addresses. Such an address (called an IP address) consists of 4 integers ranging from 0 to 255, separated by periods. For example: 190.169.200.5. It can be seen that 4 bytes are needed to encode such an address, which theoretically allows us to cover about 4 billion computers. Each provider receives a range of similar addresses from a superior provider, which he uses when interacting with his customers. In everyday work, the use of numeric addresses is inconvenient, so they are replaced by more understandable and easier to remember text (domain) addresses. The basis of the interaction of computers on the Web is the so-called protocols.

A protocol is a set of uniform formalized rules according to which computers must communicate with each other, even if they belong to different types and use different operating systems. The basic protocol for the Internet is TCP / IP. To implement various network services, there are various additional protocols - mail, file transfer, hypertext transfer, etc.

All Web servers have special text names or addresses that replace numeric addresses. Such an address consists of several parts - segments (usually 3, sometimes 4), separated from each other by dots: segment. segment. segment

Segments form a hierarchical structure (similar to a directory hierarchy on disk). The far right segment has the most high level and usually indicates the country or type of organization that owns the server. For example: ru - Russia, com - a commercial organization (mainly - the USA), edu - an educational organization (universities, institutes of the USA), org - a non-governmental organization (for example, UNESCO).

The segment to the left of it often denotes the organization itself. For instance:

- www.stanford.edu - Stanford University USA,

- www.microsoft.com - Microsoft Corporation,

- www.infoart.ru - server of the Russian Information Agency Infoart.

Setting a server name on the Web is the first step in finding the information you need. The second step is to search the disks of the selected server. All information available on the Internet is distributed over the disks of hundreds of thousands of node servers. Naturally, all information on server disks is stored in files, which are grouped into directories that form a directory tree. Since most servers are running operating systems type UNIX, the rules for specifying the full file name are slightly different from the usual rules for MS DOS and Windows.

You cannot use Russian letters and spaces, lowercase and uppercase letters are different, directory names are separated by a / (not a \\), the root directory is denoted by /. As a result, to set the full file name on the Internet, you must specify:

1. The type of protocol used to transfer information (when working with the WWW, this name is specified by the characters http :), the name of the server on the Network and the path to the required file.

2. The path to a file on the server disk is usually called the URL of a resource on the Web or a uniform pointer to a resource. Examples of URLs:

- http://www.microsoft.com/kb/softlib/prog.exe

- http://www.infoart.ru/mainmenu.

The World Wide Web(English World Wide Web) - distributed hypertext hypermedia information system, providing access to related documents located on different computersconnected to the Internet.

The World Wide Web is made up of millions of web servers. Most of the resources of the world wide web are hypertext. Hypertext documents posted on the World Wide Web are called web pages.

Several web pages, united by a common theme, design, and also linked by links and usually located on the same web server, are called a website.

To download and view web pages, special programs are used - browsers. The World Wide Web has caused a real revolution in information technology and the boom in the development of the Internet.

Often, when talking about the Internet, they mean the World Wide Web, but it is important to understand that they are not the same thing. The word “web” and “WWW” are also used to refer to the World Wide Web.

The structure and principles of the World Wide Web The World Wide Web is formed by millions of Internet Web servers located around the world. A web server is a program that runs on a computer connected to the network and uses the HTTP protocol to transfer data.

In its simplest form such a program receives an HTTP request for a specific resource over the network, finds the corresponding file on the local hard disk and sends it over the network to the requesting computer.

More complex web servers are able to dynamically allocate resources in response to an HTTP request. To identify resources (often files or their parts) on the World Wide Web, uniform resource identifiers URI (Uniform Resource Identifier) \u200b\u200bare used.

To locate resources on the network, Uniform Resource Locators are used. Such URL locators combine URI identification technology and the Domain Name System (DNS) - domain name (or directly the IP address in a numeric notation) is part of the URL to designate a computer (more precisely, one of its network interfaces) that executes the code of the required web server.

To view the information received from the web server, the client computer uses special program - web browser. The main function of a web browser is to display hypertext.

The World Wide Web is inextricably linked to the concepts of hypertext and hyperlinks. Most of the information on the Web is precisely hypertext. To facilitate the creation, storage and display of hypertext on the World Wide Web is traditionally used hTML language (English HyperText Markup Language), hypertext markup language.

The work of hypertext markup is called typesetting, the markup master is called a webmaster or webmaster (without a hyphen). After HTML markup, the resulting hypertext is placed in a file, such an HTML file is the most widespread resource on the World Wide Web [source not specified 141 days].

After the HTML file is available to the web server, it is referred to as a "web page". A collection of web pages forms a website. Hyperlinks are added to the hypertext of web pages.

Hyperlinks help users of the World Wide Web to easily navigate between resources (files), regardless of whether the resources are located on a local computer or on a remote server. Web hyperlinks are based on URL technology.

World Wide Web Technologies.

In general, we can conclude that the World Wide Web is based on "three pillars": HTTP, HTML and URL. Although recently HTML has begun to lose ground and give way to more modern markup technologies: XHTML and XML. XML (English eXtensible Markup Language) is positioned as the foundation for other markup languages.

To improve the visual perception of the web, CSS technology has become widely used, which allows you to set uniform styles for multiple web pages. Another innovation worth paying attention to is the URN (Uniform Resource Name) resource designation system.

A popular concept for the development of the World Wide Web is the creation of the Semantic Web. The Semantic Web is an add-on to the existing World Wide Web, which is designed to make information posted on the network more understandable for computers. The Semantic Web is a concept of a web in which every resource in human language would be provided with a description that a computer can understand.

The Semantic Web provides access to well-structured information for any application, regardless of platform and regardless of programming language. Programs will be able to find the necessary resources themselves, process information, classify data, identify logical connections, draw conclusions and even make decisions based on these conclusions.

When widely distributed and properly implemented, the Semantic Web can revolutionize the Internet. To create a computer-readable description of a resource, the Semantic Web uses the Resource Description Framework (RDF) format, which is based on XML syntax and uses URIs to denote resources.

New additions in this area are RDFS (RDF Schema) and SPARQL (English Protocol And RDF Query Language), a new query language for quickly accessing RDF data.