rewrite rule medo

From: Peter (BOUGHTONP)13 Feb 2017 23:46
To: CHYRON (DSMITHHFX) 7 of 9
As for the search engine stuff, the caret (^) is anchoring your match to the start of the string, but the Googlebot useragent is "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" so remove the caret. Also you shouldn't need the parentheses - the ! is a prefix, so try just "RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]"

Is it not simpler to use robots.txt to block them?

From: CHYRON (DSMITHHFX)14 Feb 2017 02:15
To: Peter (BOUGHTONP) 8 of 9
The object is to allow search engines to crawl unrewritten *.html urls (except index, and those in mobile/), and to rewrite human-submitted urls (from search results) with *.html suffix to the hashed urls -- I've got it all working with javascript redirects, but I think intercepting it before anything gets served would be preferable. Suffice to say it's become an academic exercise as the client has decided they don't want the app to be searchable after all. Now I just want to see if I can get the htaccess method to work.
EDITED: 14 Feb 2017 02:17 by DSMITHHFX
From: CHYRON (DSMITHHFX)17 Feb 2017 20:43
To: ALL9 of 9
So here's what ended up testing out on two different Apache 2.2 servers

OS X development server on powermac G5 (Apache installed through macports), localhost:8081 pointed at virtualhost:
Code: 
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/
RewriteCond %{REQUEST_FILENAME} !^/index.html$
RewriteRule ^([a-z]+)-(.+)\.html$ /#$1/$2 [NE,R=301,L]

RewriteCond %{REQUEST_URI} !^.*/#[a-z]+/[.*]$
RewriteRule ^([a-z]+)\.html$ /#$1 [NE,R=301,L]
Staging server on Ubuntu 14.04 ppc (powermac G4), hosted in an "seo2" subdirectory:
Code: 
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/.*$
RewriteCond %{REQUEST_URI} !^.*/index.html$
RewriteRule ([a-z]+)-(.+)\.html$ /seo2/#$1/$2 [NE,R,L]

RewriteCond %{HTTP_USER_AGENT} !google|yahoo|bing [NC]
RewriteCond %{HTTP_REFERER} !google|yahoo|bing [NC]
RewriteCond %{REQUEST_URI} !^.*/mobile/.*$
RewriteCond %{REQUEST_URI} !^.*/#[a-z]+/[.*]$
RewriteRule ([a-z]+)\.html$ /seo2/#$1 [NE,R,L]
Not found any good online htaccess documentation or tutorials (relied a lot on stackoverflow), so these evolved through a lot of trial and (mostly) error.

htaccess seemed pretty erratic and unreliable on the staging server with subdirectory, with frequent browser cache-clearing required or sometimes just waiting a few hours.