Remove duplicate entries in k2. How to deal with duplicate pages in Joomla? JL No Doubles plugin to combat duplicate pages

Good day everyone, if you are reading this article, it means that you, like many beginning web developers, have a completely legitimate question - how to deal with duplicate pages in Joomla.

In this article I will try to answer this question. First, let’s talk about the dangers of the appearance of duplicate pages on the site, then we’ll look at ready-made solutions to combat duplicates, and finally we’ll look at alternatives ways to deal with duplicate pages.

Why is it worth getting rid of duplicate pages?

It's no secret that Joomla, like any other content management system, in the process of its work creates identical pages available at different site addresses - duplicates. For visitors to your site, these same pages are harmless; they may not even be aware of their existence.

However, duplicates that are harmless to visitors can significantly affect the site’s position in search results. Why is this happening?

Let's imagine a situation - you have written several unique materials and published them on your website. Search robots indexed them, everything is fine, but after some time the search robot finds duplicates of these pages. The search robot does not understand that this is a duplicate, for it these are two completely different pages, as a result, the initially unique material is no longer unique.

As a result of the appearance of a large number of identical pages (duplicates) on the site, the site’s position in search results can greatly “sag”. That is why it is worth getting rid of duplicate pages as quickly as possible.

Why do duplicates appear?

Duplicates in CMS Joomla appear as a result of installing additional extensions. But this is far from the only reason for their formation.

An equally common reason for the formation of duplicate pages is an unthought-out site structure. How can this be connected, you ask? Yes, it’s very simple - you created several categories on the site with one parent category, and placed materials in these categories. Inside the materials you make links to previously created materials and so on. For an example, look at the screenshot:

If you do not create a menu for categories, then the page address may look like this:

Http://joom4all..php?option=com_content&view=article&id=38&catid=10

The address is far from perfect and you wanted to get rid of the numbers in the address bar. To do this, you create menu items for categories and material. After this, the address turned out to be more attractive:

Http://site/sites-creation/basics/intro

And everything would be fine, but the old page addresses have not gone away, they remain as duplicates. As a result, the same page can have several addresses at once:

Http://site/32-sites-creation/10-basics/38-intro http://site/32-sites-creation/basics/intro http://joom4all..php?option=com_content&view=article&id=38&catid =10

It’s not a pleasant situation, isn’t it, besides, search robots consider all these pages completely different.

Ways to deal with duplicate pages

Of course, it is best to avoid the appearance of duplicate pages at the initial stage - think through the site structure in advance, create the necessary categories and menu items. But as practice shows, not everyone does this, and over time the question arises of how to remove duplicates from search results.

There are several options for dealing with duplicates:

  • Using special extensions
  • Robots.txt file
  • Redirect 301 in htaccess file
  • Use the Webmaster panel to remove unnecessary addresses
  • Redirection Component

JL No Doubles plugin to combat duplicate pages

The simplest and at the same time sufficient an effective way to combat duplicate pages This is the use of extensions. One such extension is JL No Doubles plugin.

The plugin is very lightweight and does not require any special settings. All you need to do is download this plugin, install and enable it. He will do the rest of the work himself.

The JL No Doubles plugin removes duplicate pages in more than twenty components, including the com_content component. You can configure the display of a 404 error or a 301 redirect to the correct page of the site. The plugin settings page looks like this:

There are only five parameters for configuring the plugin:

  • License key– to activate the plugin with the k2, Virtuemart, Zoo components.
  • Multiplicity of limits– setting for Joomla material categories. You can set up a redirect when creating links like /advanced?start=3. All you need to do is indicate the number of materials displayed in the category.
  • Use 301 redirects– you can enable the ability to use a redirect to the correct page or (if set to “no”) issue a 404 error. If your site has existed for quite a long time and other sites link to its pages, then I recommend setting a redirect so as not to lose the weight of the pages.
  • Alias– a setting that substitutes an alias for links like component/content/article (home by default).
  • Stop words– these are strings found in the page address that should not be processed.

We’ve sorted out the basic settings, now let’s go to the “Components” tab:

On this tab, we select components that are used on the site and are available for indexing. By default, the com_content component is already selected. You should not select all possible components, especially if you do not use them, this will create unnecessary load on your site.

Using the Robots.txt file

No matter how great the page redirection plugin is, you won’t be able to get rid of all the duplicates. In this case, you can prohibit search robots from indexing certain pages of the site, in other words, block access to them.

We have already discussed all the intricacies of setting up the Robots.txt file in this article. Briefly, I’ll just say that there is a directive to block part of you from the “eyes” of the robot Disallow .

301 redirect and htaccess file

Another common way to deal with duplicates is to set up a redirect, in other words, a 301 redirect to the correct page. This can be done in the file " .htaccess ».

To create a redirect, you must use the RewriteRule directive, but you must make sure that the mod_rewrite module is enabled on your hosting.

Php to the website page, for this in the “.htaccess” file after the RewriteEngine On directive we write the following line:

RewriteRule http://site/index.php$ http://site

We discussed working with the “.htaccess” file in more detail in this article.

Redirection Component

Quite useful and at the same time a component built into Joomla 3 that allows you to manually configure page redirection. This component uses a special plugin for its operation, which is disabled by default.

Initially, you need to enable the plugin, fortunately you don’t have to look for it among others, because after going to the “Components” -> “Redirection” page you will receive a message that you need to enable the plugin and a link to activate it.

After the plugin is enabled, you can create a redirect by specifying the starting (old) and ending (new) address of the page:

This method is good when there are not too many duplicate pages.

Let's sum it up

In conclusion, I would like to say that although Joomla is famous for creating duplicate pages, there are many ways to get rid of them. You can decide for yourself which method is best, but I can say that an integrated approach to solving this problem will be preferable. And you should start with the right approach to creating a website structure.

In addition, it’s worth thinking about how to get rid of index.php in the site’s address bar, this will also help reduce the number of duplicate pages.

Creating and promoting websites on static HTML is becoming less and less popular, and most webmasters are switching to modern CMS, in particular Joomla, which, in addition to its advantages, can also upset the user, first of all, by duplicating pages. Duplicate pages in Joomla are a kind of scourge of a webmaster, although, frankly, many website automation systems are guilty of this.

Search for duplicates

First, let's see how to determine duplicate pages and why they negatively affect the website promotion process. The easiest way to determine duplication is to use the advanced search on Yandex, where we enter your project in the “site” line, and this or that query in the search line. As a result, you can see the pages of the site according to their relevance; there are also pages that duplicate each other. It’s even easier to use the Netpeak Spider, which will unload pages and find duplicates in them in one click.

The second important question is why search engines view this negatively, because this is not a deliberate attempt to deceive search engines, but technical problems of the CMS, which, in principle, robots should know about. The fact of the matter is that it is robots who can identify such pages as deliberate spam, because in fact, the same material is provided at two different addresses. Attempts to correspond with Yandex support service lead to nothing, so you should try to avoid duplicate Joomla pages.

Deleting duplicates

Disallow: /search/

Disallow: /*.pdf

Disallow: /*print=1

Disallow: /*type=atom

Disallow: /*type=rss

Disallow: /*task=rss

Disallow: /*?sl*

Disallow: /*?sl*

Thus, the main warehouse base where Joomla can stuff duplicates is cut off. If someone really needs to open some pages, for example, for the xmap component, that is, in order to add a site map to the webmaster panel, then the necessary pages can easily be opened using the Allow: directive, which is placed before Disallow:.

However, I personally was not given life by duplicate Joomla pages like

And long searches did not lead to anything positive except closing hundreds of left-hand links manually in robots.txt. However, one day the answer came like an insight and opened my eyes to the simplest things, which, I know for sure, I was not the only one who encountered it. Many today practice (and do it correctly) website promotion on social networks by installing buttons for integration. At the same time, not everyone pays attention to the fact that some plugins, when integrated into Twitter, simply cut off the link and to solve the problem you need to configure or replace the plugin, since robots follow the “tweet” link and end up on its cropped view, which they enter due to their electronic ignorance to index.

The problem turned out to be as simple as a copper basin, it’s a pity that only part of the takes are removed in this way, although this is the part that worried me most. It turns out that some optimization errors still occur due to a combination of the webmaster’s oversight and CMS flaws, so this can and should be dealt with. Good luck.

If you are not happy with duplicates like /sobstven-sate/eksperiment-seo/383.html, that is, shortened page addresses, use the plugin for Joomla Shnodoubles, which you can find by googling or by writing to me in the comments, with it I completely solved the problem in a matter of minutes minutes. Having thought about it and tired of unsubscribing, I suggest downloading nodoubles for Joomla directly from the site.

I also offer a video on removing duplicates in Joomla using a 301 redirect -

Questions and answers

Is it possible to get rid of duplicates automatically?

The absence of duplicates is 90% guaranteed in automatic mode. For Joomla, it is enough to configure robots and htaccess, as well as deal with merging pages into the main navigation. However, as the site expands, duplicates may appear, so track them through the Netpeak Spider.

Don’t the PS spiders understand that duplicates on Joomla are a mistake by the developers?

Why is this error not corrected by the owner? If you buy a car with a defect, then you won’t complain that the traffic police fined you for the fact that the headlights don’t light up or the exhaust gases don’t meet the standards? There is no point in contacting support, since the CMS is not paid.

Nowadays they rarely fine harshly for duplicates, but... If, for example, duplicate pages, you have 3-4 documents in the search with the same content, but different URLs, then do you think the static weight will be maximum on each of them, or will it smudge? In the end, it’s up to you to decide whether you need a decorative junk website or whether you want to share information with users and get a profit for it if you configure the CMS correctly.

In this article I want to talk about duplicate pages in Joomla. A lot of articles have been written on this topic, but it seems to me that it is worth recording my view on this problem. In this article I will talk about Joomla 3, although almost all the tips are relevant for Joomla 2.5.

The problem of duplicate pages in Joomla goes deep into the roots of the CMS itself, to be precise, not even in Joomla itself, but in the progenitor of CMS Mambo. The fact is that they didn’t think about CNCs (human-readable URLs) back then, and when the problem became relevant and Joomla 1.5 was released, instead of radically reworking the link system, a blot was made that we are dealing with now. Yes, at that time, it seemed like a solution to the problem, but as we see, the half-measure grew into a global problem.

Fortunately, Joomla developers understand that there is a problem, but do not want to take radical measures, which, by the way, were proposed by the community. There was even a successful fundraiser for a new Joomla router, but the changes are having a hard time making it into the main Joomla distribution.

So what does Joomla actually do to avoid duplicates?

They again took the path of half measures and introduced the canonical tag, which is designed to point to the real Joomla page. We won’t talk about the thorny path of implementation, I’ll just note that it was really thorny. And I will say that this method really allows you to reduce the number of duplicates on the site, but the trouble is, this method almost does not work for Joomla components, since component developers must take care of implementing support, correct support, and this does not always happen. And to be honest, the canonical tag itself is not a panacea.

I will tell you about this simple and effective method below.

In fact, this method will help you reduce the number of takes by several times. In my practice, it allowed me to reduce takes by 10 times.

What do we need for this?

  • A little time and hand

The first thing Google tells us is to exclude the duplicate domain.

How to do it?

Redirect from the www domain to a non-www domain. That is, we go to the site www.site.ru, and we are redirected to the site site.ru.

Add the following rule to the .htaccess file.

RewriteCond %(HTTP_HOST) ^www\.site\.ru$ RewriteRule ^(.*)$ http://site.ru/$1

Replace Site.ru with your domain.

Now let's move on to the pleasant stuff.

Install the JL No Doubles plugin and enable it in the plugin manager. If you only have standard Joomla materials, then you don’t need to configure anything. Actually, this series of simple steps will help you radically reduce the number of duplicates on the site.

We talked about why they occur and how to look for them. In this article I’ll tell you how you can remove duplicates or prevent them from appearing in search results.

Since each case is individual, we will consider the most popular methods, which work perfectly in 99% of cases. You can choose one for yourself or use it in combination.

Although all these methods are applicable to any other CMS system, I will dwell in detail on the features of Joomla.

All of these examples are valid if you have enabled standard SEF and URL Redirection in J's global settings.

  • 1. Plugin for Joomla

The first thing you can do if you have confusion in the URLs (when links are formed from both the category alias and the menu item) is to install the Shnodoubles plugin from sherza.

Excellent plugin, copes with its task 100%. After installing it, the incorrectly formed link (from the category alias) is redirected to the correct one (from the menu item). You can download this wonderful plugin that eliminates duplicates for Joomla 2.5 (direct link!)

After activating the plugin, some duplicates will simply stick together.

  • 2. Robots.txt for Joomla

This file comes in the standard Joomla distribution, is located in the root and is available at the link site.ru/robots.txt. The main purpose is that it gives instructions to search robots for indexing the site. With its help, you can close some (needed) sections of the site; regular expressions are also supported - you can close individual pages using a mask.

Most often I use this instruction (in addition to what comes in the default file):

Just one line gets rid of a large amount of garbage. It can be:

  • pages for printing materials, also contain print= or tmpl=component in the URLs
  • links to rss feed
  • site search results pages
  • will also close pagination pages
  • There may be other options, depending on the extensions used

Whether you use this line or block each type of page individually is at your discretion, but keep in mind that too many robots are considered completely permissive. It is also worth making sure that this line does not cover something important, for example, a site map - in this case, you can write: Allow: /path_to_map

You can read more about using robots.txt in Yandex help - help.yandex.ru/webmaster/?id=996567

  • 3. The rel="canonical" attribute of the tag

Using this attribute will help the robot determine which pages should be included in the index and which should not. If there are very similar pages on the site (fuzzy duplicates), differing only, for example, in sorting options - new ones from the beginning, or sort in ascending order, or show by 20-30, etc., then you can use this attribute. In this case, you will need to select one canonical page, which will be ranked, and add rel = “canonical” to the rest indicating the selected page - such documents will not be included in the search results, the robot will know that they do not need to be included in the index.

For more information on how to implement rel="canonical" in Joomla 1.7/2.5, see

  • 4. 301 redirect

It is appropriate to use if you have changed page addresses, but the documents still exist, that is, you have not deleted them. In this case, for proper gluing, it is recommended to use a 301 redirect in .htaccess - search engines will know that the document has moved to a new address. This method allows you to save the website’s indicators – Tietz and PR.

301 redirects can also be used to merge duplicates. For example, the well-known duplicates of the main page of a site on Joomla are /index.php and the alias of the Home menu item, for example, /home or /homepage

Gluing them together is quite simple, open .htaccess and enter

Redirect 301 /index.php http://site.ru/

Or you can make a 301 redirect to php in the index file of your template

if($_SERVER["REQUEST_URI"] == "/index.php") (
header("Location: /",TRUE,301);
exit();
}
?>

And I’ll give you a classic redirect from www to without www

RewriteEngine On
RewriteCond %(HTTP_HOST) ^www.example.ru$
RewriteRule ^(.*)$ http://example.ru/$1

* example.ru replace with your domain name.

  • 5. Meta robots tag

Another way to prevent duplicates from being indexed in Joomla is to use a meta tag:

At the moment, this method is more effective for Google than prohibiting instructions in the robots.txt file. For example, to block print pages and duplicates at?tmpl=component from indexing, you can open the component.php file in the root of your template and enter this tag in .

To close search results pages on a website using standard com_search, you can add a condition to index.php of the template



But first you need to define a variable

$option = JRequest::getVar("option", null);

I won’t go too deep into the conditions in the templates, that’s not what the article is about, I hope the principle is clear.

  • 6. Removing url from the panel

Another quick way is manual removal from the webmaster panel.

For Yandex you need to go to the address - webmaster.yandex.ru/delurl.xml

There is still 1 not very popular method left for eliminating Joomla duplicates from search results, but we will also consider it.

  • 7. X-Robots-Tag Headers

Quite a rare title, used more often by foreign optimizers, it works for Google. Unfortunately, Yandex has not yet commented on support for this http header.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
...
X-Robots-Tag: noindex
...

As you can see, there are many ways to remove duplicate Joomla content; you should at least roughly understand how each of them works in order to choose the most suitable option and apply it to your situation.

Continuing the topic:
Miscellaneous

http://market.yandex.ru/model.xml?hid=418706&modelid=8497927&clid=502 Successful design and controls, good ergonomics, long operating time, A2DP support and...