15 Replies Latest reply on Sep 23, 2013 1:25 PM by mnevill

    Bot/crawler pageview spike

    mnevill

      Last month we saw a HUGE spike in our bot/crawler pageviews.  The two culprits were Google and Northern Light.  Google jumped up 52% and Northern Light increased 71% from the previous month.  Does anyone have any insight why that would happen if we have not increased our content by any significant amount from last month?   Has anyone else come across this before?

       

      CC: Jive Analytics Jive Analytics User Group

       

      Thanks,
      Matt

        • Re: Bot/crawler pageview spike
          mnevill

          Jason Lax or LG . I have seen you guys talk quite a bit about Google and searches in general.  Do you guys have any insight on this?  Any thoughts or speculations would be appreciated.

           

          Thanks,
          Matt

            • Re: Bot/crawler pageview spike
              JasonLax

              This could be a good thing, unless the weight is having an impact on site performance due to the additional load. There's was an update to Google's algo back in June that pushed up your rank.

               

              Beyond that, there might be some technical reasons: robots.txt is missing or changed (paths no longer blocked), changes to place permissions (e.g. space and all content within is now available to anonymous visitors). Also look into an increase in the number of backlinks.

              1 person found this helpful
                • Re: Bot/crawler pageview spike
                  mnevill

                  That narrowed things down quite a bit.  We did not change anything with robots.txt or permissions, and added our normal amount of monthly content.  From that it sounds like this could have been backlinks and/or the algorithm change.  I had no idea it was updated. Thank you so much for the help!

                   

                  Also, I'm curious if anyone else out there with an open external community saw a drastic increase in bot/crawler views from Google too?

              • Re: Bot/crawler pageview spike
                mnevill

                Well, hmmmm. I just got the July report from Jive hosting and the same thing happened again.  I also investigated backlinks to the best of my ability and they don't seem high.  If this was a result of the Google algorithm change would it be likely for this to happen two months in a row? 

                  • Re: Bot/crawler pageview spike
                    Chrisbrenschmidt

                    So we had this problem either last year or the year before.  The bot traffic was causing us to exceed our license.  It was incredible (and not in a good way). 

                     

                    What we eventually did is that we put a robots.txt file in place for our user profiles. We have a huge number of 'registered' users but most of them are not active and many don't want to be found on Google. This has helped our traffic immensely.  

                     

                    Christine

                    1 person found this helpful
                      • Re: Bot/crawler pageview spike
                        mnevill

                        Christine - thanks for the tip! Now that you mention it, there are some areas that may make sense to hide from bots.  I may be wrong, but I don't know this will help us much since we only have 5,000+ users right now and we are seeing 300,000+ bot/crawler views most of which come from Google.  It is just perplexing that these numbers doubled from anything we have seen before for both June and July.

                      • Re: Bot/crawler pageview spike
                        it2000

                        If Google did change something then it will likely stay as it is - also this month you should see a lot of Google traffic. As you already use the "Crawl-delay" directive in robots.txt you could increase its value.

                        On the other hand - now that you have more hits by Google - you could try to optimize the forum to get a better page rank.

                         

                        PS: I did never hear of Northern Light before. Compared to Google it finds nothing.

                        1 person found this helpful
                          • Re: Bot/crawler pageview spike
                            mnevill

                            LG. - Thanks for the input.  As you stated this may be the new norm.  It is just strange to me that more people are not buzzing about this here in the JC, as I would assume it greatly impacts other external communities too.

                             

                            Sorry for all the questions on this, but hopefully it will help someone else later too.  Here are a few more I have:

                             

                            • If I filter by our site within Google it only shows 42k results, but Googlebot seems to account for 147k hits.  Is this related to /thread and /message being indexed separately (if they are), even though they live together on the same page?  If so, is there a downside to blocking /message?  Our content does show up in Google easily and I don't want to do anything to prevent that, especially since roughly half our traffic comes in through Google. 

                             

                            • Once a base index of a site is built do bots have to re-crawl the entire site each month to determine changes, or is there another method to keep things in sync between the site and the search engine?  I know from Google alerts that Google seems to pick up new content within a day or two, so is there some sort of daily mini-index that happens as opposed to a fill index?  Sorry for all my ignorance here.  If there is a good site that explains how all of this works feel free to point me to it.

                             

                            • Is the crawl-delay mostly to prevent the bot traffic from causing site performance problems related to traffic load, or does it serve another purpose too?

                             

                            Thanks,
                            Matt

                              • Re: Bot/crawler pageview spike
                                it2000

                                If you use the canonical meta tag in your community pages then Google may fetch much more URLs and return only the canonical URLs. The 42k number is just an estimate, using the Google Webmaster tools may be the better choice. Are the 142k hits unique hits or did Google fetch 142k unique URLs?

                                 

                                You may need both paths /message and /thread to allow Google to crawl all pages and to improve link power for recent pages. This leads to duplicate content ... hopefully one day Jive will fix this and add canonical tags to every page to fix this.

                                 

                                Google has a full index, and depending how often the content changes, new pages (forum threads, documents) are added or how relevant it is (links, searches, clicks) Google will crawl the domain or parts of it again.

                                 

                                I assume that the crawl-delay should be used to protect the domain. Every search bot which is crawling one page after the other counts as a "concurrent online" user, depending on the server this may cause a high load if multiple bots are active.

                                1 person found this helpful
                                  • Re: Bot/crawler pageview spike
                                    mnevill

                                    LG . - I just installed Google Webmaster Tools to shed some more light.  I don't know if the 142k were unique hits vs URLs. That number came from Jive hosting when I inquired about what made up the Bot/Crawler count they report.  I assume it is any pageview sourced from a bot.  Thanks for the advice on /message and /thread.  I will leave them both as they are.

                                     

                                    Chrisbrenschmidt -  Looking more into what you reported I found that a significant amount of the results count estimate from Google comes from /people. Just out of curiosity how many users did you have at the time you implemented the filter and did it seem to have any adverse effects?

                                      • Re: Bot/crawler pageview spike
                                        it2000

                                        We still have a problem with "spam users" who use the profile fields to promote their domain, even though we block /people for spiders.

                                        I'd block /people anyway unless you want that users find your member profiles in Google. Most times users want to find solutions for their problems with a product, so the discussion, documents and blogs are important.

                                          • Re: Bot/crawler pageview spike
                                            mnevill

                                            I don't know if it was purely the result of blocking /people but our numbers have dropped back to a more normal range again.  Now that I have some data from webmaster tools, is there an easy way to tell how many Googlebot views I had for each month?  I want a simple way to compare what it says with the Jive hosting report.  When I go to Crawl > Crawl Stats and look at the first chart for "Pages Crawled Per Day", it shows a chart of the last 90 days and no way to export the data or change the date range that I can see.  I'm assuming if I could just tally the count from each day of the month that should be inline with how many views were attributed to Googlebot.  Is that correct?  Also, do you know an easier way to get that data without having to move the mouse over each day in the chart and jot them all down individually?  Thanks for all the help on this.

                                              • Re: Bot/crawler pageview spike
                                                mnevill

                                                I tried adding up each day individually and the Crawl report from Webmaster tools is more than double what the Jive hosting report showed for Googlebot views.  It looks like that is not the right approach.  If you know of another way to see how many botviews Googlebot creates in a month from the Webmaster tools, I would appreciate any advice.

                                                  • Re: Bot/crawler pageview spike
                                                    it2000

                                                    Getting the access logs and a "grep -ic Googlebot" gives you a very good number as long as the logs contain the user agent. To get the exact number you need to make sure that the source IP addresses match Google servers.

                                                    There could be HEAD requests or 30x/50x return codes which may or may not count as Jive page views.

                                                      • Re: Bot/crawler pageview spike
                                                        mnevill

                                                        Jive hosting supplied a spreadsheet with the each bot user agent name and how many views they racked up. I just can't seem to figure out where those numbers come from since they don't match what Webmaster tools says. It is very possible I'm just not reading them correctly out of Webmaster tools.