| « SPAM: Frazier: Stocks to Roll Over. Get Out Now - PT 1- "The Notice" | ASP URL Rewriting - Pt 1 "Starting Line" » |
Previously, in Part 1, i described the skills sets required to understand the theory and concepts behind URL rewriting in ASP.
In Part 2, I will:
- discuss the patterns and practices available to use for Rewriting.
- show several examples of RegEx patterns and some real world web addresses to be validated against.
- show some visible benefits and flaws to the RegEx design and some possible ways around the flaws.
But unfortunately, when using RegEx patterns, the thicker you make the pattern, the less likely it is to be easily readable and/or introduces more loop holes in the whole pattern. So before you start even attempting to do a Generic RegEx pattern for rewriting, i would suggest you first create a standard for the QueryString and develop the RegEx around that. But that is for the next conversation about this.
On to the RegEx patterns and results:
Pattern 1: ^(.+)/(.+)/(.+)/(.+)(\((.+)\).aspx)$
Address #1:
http://msdn.microsoft.com/en-us/library/ms229335(v=VS.90).aspx
Group Pattern Segment $1 (.+) http://msdn.microsoft.com $2 (.+) en-us $3 (.+) library $4 (.+) ms229335 $5 (\((.+)\).aspx) (v=VS.90).aspx $6 (.+) v=VS.90 A potential rewrite would be:
DOES NOT WORK!
http://msdn.microsoft.com/index.aspx?lang=en-us&type=library&id=ms229335&v=VS.90
Address #2:
http://msdn.microsoft.com/en-us/library/system(VS.85,printer).aspx
Group Pattern Segment $1 (.+) http://msdn.microsoft.com $2 (.+) en-us $3 (.+) library $4 (.+) system $5 (\((.+)\).aspx) (VS.85,printer).aspx $6 (.+) VS.85,printer A potential rewrite would be:
DOES NOT WORK!
http://msdn.microsoft.com/index.aspx?lang=en-us&type=library&ns=system&v=VS.85,printer
Some noticeable flaws in the design of the RegEx pattern:
- Tied down to the number of "folders" after the protocol-domain segment
- Does not validate URL's without page extension (such as
http://en.wikipedia.org/wiki/Msdn) - The 6th segment is only useful with one (1) value inside the parenthesis, would need to accomidate for more if you wanted to print, preview, etc the page.
Pattern 2: ^(.+)/(.+)/(.+)/(.+)(\((.+)\))$
Address #1:
http://msdn.microsoft.com/en-us/library/ms229335(v=VS.90).aspxThis address does not validate. Reason is because of the RegEx segment:
(\((.+)\))$. This segment validates the end of the, to-be validated, string (Address) to see if this condition applies. And since the end of the string ends in a ".aspx" the whole string fails. Read up on the carret (^) and dollar ($) operators. These will be key to understanding how a string gets validated and the associate precedence order.
Address #2:
http://msdn.microsoft.com/en-us/library/system(VS.85,printer)
Group Pattern Segment $1 (.+) http://msdn.microsoft.com $2 (.+) en-us $3 (.+) library $4 (.+) system $5 (\((.+)\).aspx) (VS.85,printer) $6 (.+) VS.85,printer A potential rewrite would be:
DOES NOT WORK!
http://msdn.microsoft.com/index.aspx?lang=en-us&type=library&ns=system&v=VS.85,printer
Some noticeable flaws in the design of the RegEx pattern:
- Tied down to the number of "folders" after the protocol-domain segment
- Does not validate URL's with page extension (such as
http://en.wikipedia.org/wiki/Msdn.php) - The 6th segment is only useful with one (1) value inside the parenthesis, would need to accomidate for more if you wanted to print, preview, etc the page.
Pattern 3: ^(.+)/(.+)/(.+)/(.+)(\((.+)\).aspx|\((.+)\))$
Address #1:
http://msdn.microsoft.com/en-us/library/ms229335(v=VS.90).aspx
Group Pattern Segment $1 (.+) http://msdn.microsoft.com $2 (.+) en-us $3 (.+) library $4 (.+) system $5 (\((.+)\).aspx|\((.+)\)) (VS.85,printer) $6 (.+) VS.85,printer $7 (.+) [NO MATCH] A potential rewrite would be:
DOES NOT WORK!
http://msdn.microsoft.com/index.aspx?lang=en-us&type=library&id=ms229335&v=VS.90
Address #2:
http://msdn.microsoft.com/en-us/library/system(VS.85,printer)
Group Pattern Segment $1 (.+) http://msdn.microsoft.com $2 (.+) en-us $3 (.+) library $4 (.+) system $5 (\((.+)\).aspx|\((.+)\)) (VS.85,printer) $6 (.+) [NO MATCH] $7 (.+) VS.85,printer A potential rewrite would be:
DOES NOT WORK!
http://msdn.microsoft.com/index.aspx?lang=en-us&type=library&ns=system&v=VS.85,printer
Some noticeable flaws in the design of the RegEx pattern:
- Tied down to the number of "folders" after the protocol-domain segment
- Whether its Group 6 or 7, you still need to implement a pattern to segregate the values in the parenthesis if their are multiple values. If only 1, you are good togo; otherwise, implementation of a deeper pattern for the parenthesis section is needed.
Feedback awaiting moderation
This post has 34 feedbacks awaiting moderation...
Recent comments