Saturday, September 26, 2009

Roaming Profiles Sync-to-Server Problems

For several years, I’ve used roaming profiles in my Microsoft Active Directory domain at home so my family members get to keep their settings after I reimage their computers, which happens once in a while.  Sure I could have used the USMT or FAST wizard or copied the data manually but this “enterprise” way of doing it allowed me to learn the technology enough to actually be able to recommend (or not) to my clients, based on personal experiences with it.  Note here that using ONLY roaming profiles to store user data and settings on a server is NOT recommended.  A comprehensive desktop strategy which integrates several technologies such as redirected folders, and appropriate exclusions via Group Policies, among other things is required for good results.  In my home setup, I’ve used Dan Holme’s solutions collection 3, as described in his Microsoft Press book “Windows Administration Resource Kit, Productivity Solutions for IT Professionals”, ISBN 978-0-7356-2431-3.  This has served me quite well.  Before implementing these solutions, roaming profiles and other technologies never worked well together in my network, so I highly recommend his book.

One particular headache with roaming profiles however is when the profile stops being uploaded to the server at logoff.  I’ve discovered this happens for different reasons, and each time, it was difficult to identify the root cause.  Searching the Internet for the particular error message (or lack thereof), or event ID, did not always lead to any conclusive fix.  Most of the time, solutions such as domain un-join/re-join, or deleting the local profile, I thought were too drastic, and felt like they never would reveal the true cause of the problem.  In general, I tend to troubleshoot in a more methodical way, and don’t fall for the random “shoot-in-every-direction-until-you-kill-the-problem” method.

So, I thought I’d share my findings here in hopes they’d be useful for others.

Once, I replaced a Windows Vista Business x86 no-brand machine for a user with a new Windows Vista Business x64 HP Pavilion machine.  The swap went well, and the user logged in and got his desktop and settings just like in his old machine.  At logoff however, I noticed it was very fast, and the usual 30 seconds to a minute the system takes to write the roaming profile to the server just wasn’t there.  No error message or event ID was present in this case, and had I not been paying attention and known better, I would have concluded “wow, this new machine is fast” and been done with it!  So, I looked on the server and noticed the last time the ntuser.dat file was written was when he had last logged off his old computer.  So, I Googled for a long time and finally found someone with an HP computer who had the same problem and who had to disable the “NVIDIA Display Driver Service” from msconfig/services tab.  When I implemented this workaround, the roaming profile started to get written to the server again at logoff.  And, the service has been disabled ever since with no adverse effect and no impact that I can tell.  This was reported to HP and they advised they would fix it.  I do not know the status of this case however.

Another time, on my own machine, I got an error message at logoff that my roaming profile was not completely synchronized with the server.  I therefore checked the server and found ntuser.dat had indeed been updated when I logged off.  I logged back in and then got a balloon pop up in systray which indicated that I was logged on with a temporary profile.  I went to the even log and found many event ID 1509 errors followed by 1504 corresponding to the previous logoff.  The event 1509 errors were “Windows cannot copy file C:\Users\username\AppData\Roaming\Macromedia\Flash Player\#SharedObjects\TSACWEND\cisco.elementk.com…” “The filename or extension is too long”.  So, I looked into this directory and indeed found many nested folders under this.  Basically, this is cached data from my previous training session I had just taken at Cisco elementk.  The training was apparently based on Flash and had cached too many levels of folders, which prevented the roaming profile mechanism from copying them to the server.  Because I was done with the course and did not need these “cookie-like” files anymore, I deleted the “cisco.elementk.com” folder and did a logoff.  No further errors!

As you can see, roaming profiles can be a bit finicky, however, if you take the time to research the error, it is possible to fix without resorting to the more drastic measures suggested in some online forums.

Friday, August 14, 2009

Cisco IOS Configs for QoS Shaping of Vonage Traffic (Part 3 of 3)

In my last post, I described how to ensure the Vonage traffic consisting of voice and call signaling gets the correct DSCP while on the internal network.  However, even though packets are marked correctly, they will not be prioritized over other traffic during congestion if we don’t configure our routers to take action on them. 

In this post, I will implement traffic shaping on the interface of the routers facing the Internet, meaning the ones directly-connected to the cable modem and ADSL modem. 

Note- There is a lesser need to prioritize traffic anywhere else inside our network because of adequate LAN bandwidth, although should a need arise in the future, our DSCP values are already set and ready to use.

Basically, because the ISP polices or shapes our upload traffic to a certain rate, in my case, cable modem upload is not usually able to go higher than 512 Kbps, and ADSL upload is limited to a maximum of 384 Kbps.

So, if we don’t implement traffic shaping to these rates on our routers, we basically let the ISP control what traffic to delay or drop when we exceed the upload speed we pay for.  This is very bad because the ISP (either on our modem or on an upstream device) will most likely drop or delay some of our voice traffic! The results are audio quality problems, among others things.

Hence, if we can perform the QoS prioritization ourselves and say, give 90 Kbps to our voice traffic, which is enough for one G.711 phone call, which is the codec Vonage uses, and 4% of the remaining bandwidth to call signaling, and the rest of the upload bandwidth to the other non-time-sensitive apps like email, HTTP and FTP, in case of congestion, we will guarantee all the voice traffic will upload promptly, and anything over the shaped rate will get randomly dropped or delayed.

There are several Cisco IOS QoS mechanisms that can help us meet the above requirements.  In this Vonage case, I chose Low Latency Queueing (LLQ), Class-Based Weighted Fair Queueing (CBWFQ), Weighted Fair Queueing (WFQ), Weighted Random Early Detection (WRED), and a traffic shaping.  These tools will act upon our previously assigned DSCP values.

Let’s use the router supporting the cable modem connection for our example.  The Cisco QoS SRND document teleworker section states the shaper should be set to a rate of 95% of the upload speed advertised by your cable ISP.  Even though this is not a teleworker environment, the hardware and ISP access media is the same so I opted to follow this guideline.  However, in my case, even though my cable ISP tells me I won’t be able to upload to more than 512 Kbps, I know they allow more because I monitor this via NetFlow and the reports show speeds up to 768 Kbps sometimes.  So, I decided to shape at 512 Kbps instead of 486 Kbps knowing I have enough headroom.  This can be easily adjusted later.

Before doing the modular QoS CLI (MQC) commands, we need to tell the router how much bandwidth he has on the outside interface, and also force the calculation of the remaining bandwidth to equal 100% after subtracting our voice priority queue (LLQ) of 90 Kbps from 512 Kbps.  Here is the explanation:

Bandwidth 512 – 90 (LLQ) = 422 Kbps

422 Kbps = 100% of the remaining bandwidth available for CBWFQ/WFQ, as long as the following commands are entered on the interface:

int fa0/1

 bandwidth 512

 max-reserved-bandwidth 100

We just do this to allow simpler calculations.  At this point, I want 4% (422 x 0.04 = 17) assigned to call signaling, and 96% (422 x 0.96 = 405) assigned to the rest.

policy-map PM_QUEUEING

 class CM_VOICE_BEARER

    priority 90

 class CM_CALL_SIGNALING

    bandwidth remaining percent 4

 class class-default

     random-detect

    fair-queue

Note- Similar calculations would be done for the ADSL router, except in the QoS SRND, the recommendation is shaping at 70% of the upload contract.  In our case, 384 Kbps x 70% = 269 Kbps.

The above commands tell the router to use a 3-class (queues) model to prioritize our 3 types of traffic.

The keyword “priority” in the first class tells the router that as long as there are packets in this first queue, it should forward them and not service the next class until this priority queue has been serviced.  We prevent ‘starvation’ of the other 2 classes by policing at 90 Kbps. This means that any traffic above 90Kbps will be dropped and the next class will be serviced.  As mentioned above, this mechanism for the first class is called LLQ.  Now, in this case, the policing at 90 Kbps is appropriate because we have only one Vonage ATA with a single phone line, so we will always have a maximum of a single call at once.  Hence, the 90 Kbps will never be exceeded.  If we add a second line, we would need to double this priority queue to 180Kbps, which would mean there will be less bandwidth available for the rest of the classes.  Or, you could configure your Vonage setup (via their web-based dashboard) to use a lower bandwidth codec such as G.729.  So, you can see that in an enterprise environment, when LLQ is used for voice packets, which is the recommended method, some way of ensuring there will only be a certain maximum of calls at once is very important.  Call Admission Control (CAC) and gatekeeper devices is one of the tools for this.  However, it’s not in the scope of this article since we are only dealing with a home network.

The next class for call signaling is configured with the “bandwidth” keyword.  This QoS tool is called CBWFQ.  We simply assign 4% of our 422 Kbps remaining to call signaling.

The last class (class-default) is always present.  We could configure it with the remaining 96%, however, instead, we don’t assign a bandwidth which tells IOS “the rest of the bandwidth”.  One thing to keep in mind when not assigning a bandwidth to the class-default, is the other classes can rob it of bandwidth. This is because of the way the CBWFQ algorithm has been coded; if classes protected with a bandwidth statement are offered more traffic than their minimum bandwidth guarantee, the algorithm tries to protect such excess traffic at the direct expense of robbing bandwidth from class-default (if class-default is configured with fair-queue), unless class-default itself has a bandwidth statement (providing itself with a minimum bandwidth guarantee). In our case, we don’t mind if the call signaling class robs class-default for more bandwidth. 

We also configure the keyword “random-detect”, which enables WRED.  During congestion, WRED will selectively drop TCP packets matching this class to ensure there is no global synchronization of TCP traffic.  Basically, because TCP packets will be re-transmitted if dropped, we can do this with almost no noticeable impact.  However, the benefit is when a TCP session notices there are dropped packets, it slows down.  So, it therefore helps our router, and allows all traffic to be forwarded more efficiently.

We also use the “fair-queue” keyword on the class-default to use WFQ instead of FIFO.  WFQ classifies packets based on flows and favors low-volume high precedence flows.

Now that we have our queues (classes) configured with the bandwidth desired, during congestion, our preferred voice traffic will be prioritized.  However, we still need to configure the shaper to ensure we don’t exceed our ISP upload speed.  The following commands describe how to do this.

policy-map PM_SHAPING

 class class-default

    shape average 512000 5120 0

  service-policy PM_QUEUEING

In this configuration, we assign the class-default a shaping rate of 512 Kbps with the most appropriate Bc (5120 b) and Tc (10 ms) for voice traffic.  An explanation on how those second and third values were determined would be too lengthy for this blog but can definitely be found on the Cisco website or any good Cisco QoS book.

The reason we assign “class-default” the shaping rate is we need the router to activate shaping whenever “any” packet exceeds our 512 Kbps contract.  Once the shaper is active, it then can queue according to our “PM_QUEUING” classes.  The shaper then decides when to dequeue the next packet, however, it relies on the PM_QUEUEING to tell it which packet to dequeue next.  That’s how our voice traffic will get prioritized over our previously configured class-default.

The last step is to assign the policy-map PM_SHAPING to our outside interface in the outbound direction.

int fa0/1

 service-policy output PM_SHAPING

end

To view the QoS statistics:

show policy-map int fa0/1 out

I hope these three posts on how to configure your Cisco routers for Vonage traffic were useful.  Feel free to comment!

Thursday, June 18, 2009

Cisco IOS Configs for QoS Classification and Marking of Vonage Traffic (Part 2 of 3)

This is the second post of 3 about my experience with Vonage QoS on my Cisco network.

As mentioned earlier, here are the configs I came up with.  But before, the layout of the network:

    Vonage Datacenter
           |
           |
        Internet 
        /       \  
       /         \
Cable modem     ADSL modem
      |            \
Fa0/1 |             \ Fa0/1
Cisco router-----Cisco router
Fa0/0 |              | Fa0/0
      |              |

Fa0/1 |              | Fa0/2 
Cisco Catalyst 3550 switch
Fa0/23    |
          |
  VDV21 ATA (192.168.8.96)

Note- I use the cable ISP as my primary ISP as they give me 512 Kbps of uploads vs 384 Kbps for the DSL telco.  However, the QoS configs are applied to both routers in case of failover. 

So, the first step was to ensure the DSCP for every voice bearer and call signaling packet leaving the 3550 was correct. Because the voice bearer RTP traffic leaving the VDV21 to the Internet was already marked as DSCP EF, I didn't change anything. For the call signaling traffic leaving the VDV21, I classified and marked it on the input interface of the 3550 as follows.


mls qos
ip access-list extended ACL_CALL_SIGNALING
 deny udp host 192.168.8.96 any dscp ef
 permit udp host 192.168.8.96 any range 10000 20000

class-map match-any CM_CALL_SIGNALING
  match access-group name ACL_CALL_SIGNALING

policy-map PM_CLASSIFY_AND_MARK
  class CM_CALL_SIGNALING
   set dscp cs3

int fa0/23
 desc VDV21 - VONAGE ATA
 service-policy input PM_CLASSIFY_AND_MARK

int range fa0/1 - 2
 mls qos trust dscp


The first ACE will prevent the voice bearer originated by the ATA from being re-marked. The second will allow any packet with a source of the ATA, and a destination anywhere with a UDP port >= 10000 and <= 20000 to be marked.

The class-map is created to classify the call signaling based on the ACL. Then, a policy-map is created to mark the traffic identified by the class map to DSCP CS3. I chose CS3 for call signaling based on the Cisco QoS SRND document. This way, if future needs warrant adding more classes such as a mission-critical class, AF31 will be available for this. Note- the CS3 is also chosen over AF31 because WRED will work better with Assured Forwarding 1,2 and 3 for different levels of drop probabilities.

Note that the 3550 does not support NBAR, which might have worked to classify the RTCP traffic. Instead, we are restricted to using ACLs.

Once the policy-map is created, it is applied to the input interface where the ATA is connected. And, the interfaces to the routers are configured as trusted, meaning, any packet received with specific DSCP values will not be re-marked or reset to default. We do this for the return traffic.

This takes care of our 2 outbound traffic classes. Practically, for the other direction, there is no need to mark and classify this traffic as it enters the routers from the Internet because we are dealing with high-speed FastEthernet interfaces toward the LAN, which will most likely not get congested. However, for the purpose of the exercise, we will classify and mark this traffic so we can action it in the future on the internal LAN if we need to, as well as for reporting purposes (i.e. to be able to generate NetFlow reports based on DSCP).

On the Cisco routers, we will classify and mark the voice and call signaling traffic when it enters from the Internet:

object-group network OBJ_VONAGE_ATA_HOSTS
 description VONAGE ANALOG TELEPHONE ADAPTERS
 host 192.168.8.96

ip access-list extended ACL_VONAGE_ATA_HOSTS
 permit ip any object-group OBJ_VONAGE_ATA_HOSTS
ip access-list extended ACL_VONAGE_SIP
 permit udp any eq 10000 object-group OBJ_VONAGE_ATA_HOSTS

class-map match-all CM_VONAGE_SIP
  match access-group name ACL_VONAGE_SIP
class-map match-all CM_VONAGE_RTCP
  match protocol rtcp
  match access-group name ACL_VONAGE_ATA_HOSTS

class-map match-all CM_VONAGE_VOICE_BEARER
  match protocol rtp
  match access-group name ACL_VONAGE_ATA_HOSTS

class-map match-any CM_VONAGE_CALL_SIGNALING
  match class-map CM_VONAGE_SIP
  match class-map CM_VONAGE_RTCP

policy-map PM_CLASSIFY_AND_MARK
  class CM_VONAGE_VOICE_BEARER
   set dscp ef
class CM_VONAGE_CALL_SIGNALING
   set dscp cs3

int fa0/1
 desc OUTSIDE - ISP
 ip nbar protocol-discovery
 service-policy input PM_CLASSIFY_AND_MARK

Note that I use an object group since this router runs 12.4(22)T1 and I wanted to test this new functionality which I assume Cisco carried over from the PIX OS. Better late than never!

So, the first ACL permits any traffic with a destination of my ATA. The second one permits traffic with a source of UDP 10000 and a destination of my ATA. This is for the UDP/SIP 20-second interval REGISTER "keepalives" as well as INVITE/On-hook/Off-hook messages.

The CM_VONAGE_SIP class-map matches this ACL. However, call signaling consists not only of SIP, but also of RTCP as stated earlier. So, we also need the CM_VONAGE_RTCP class-map. Thankfully, we can use NBAR for this one. This is why we also have "ip nbar protocol-discovery" applied to the outside interface.

Once the SIP and RTCP class maps are defined, we put them together in a class-map called "CM_VONAGE_CALL_SIGNALING". This class-map matches either one ("OR" boolean).

Then, we classify the voice bearer traffic using a similar method. This time, we tell the router, if your NBAR process receives RTP packets (voice packets) with a destination of the ATA, classify those.

We then create the policy-map, and mark the voice packets with DSCP EF, and the call signaling (RTCP & SIP) with DSCP CS3. We do this by applying the policy-map to the outside interface, inbound.

Note- The routers in question run NAT and CBAC so I wasn't sure if the classification would work as it calls the ACL which contain the private IP address of the ATA. And, as far as I can tell, documentation on this is non-existent. But a packet capture reveals the NAT de-translation processing occurs before the QoS classification, hence we are good to go!

Now that we have our voice and call signaling marked in both direction, the last step is to take action on it. Meaning, configure queueing and shaping to prioritize this traffic over the web browsing, email and other less time-sensitive traffic. I will post those configs in the near future.

Saturday, May 30, 2009

Vonage Voice-Bearer and Call Signaling Traffic QoS (Part 1 of 3)

This is the first of 3 posts I am making regarding my experience with Vonage on my Cisco network. Last week, I got a call from my small cable provider (Stowe Cable/Stowe VoIP) which, by the way are awesome and always provide superior customer service. Anyway, the lady called to let me know their VoIP division was bought by Momentum and that they would not be able to carry my home phone number anymore. Because it would be an inconvenience to have to update everyone with a new phone number, I decided to see if there were any other companies that offer VoIP and would be able to carry the number. I searched with Google and found several companies, some of which had already gone out of business, and some major well-known ones. Most provided the ability for me to enter my phone number on their website to see if my number could be transferred to them. I tried with voip.com, 8x8.com, a few others and Vonage.com. Of those, only Vonage could carry my number. So, I signed up, and started the process. All the while, I set to try to understand how this would work. With Stowe VoIP, an EMTA device was placed directly on the coax entering the house from outside, and the phone was connected to it via a regular RJ-11 cable. In contrast, with Vonage, I learned that they would ship me a VDV21-VD which is basically an analog telephone adapter (ATA)-router combo with ethernet ports instead of coax. Now, because I already have two Cisco routers and a Cisco multilayer switch, I wasn't sure how this would connect. Realizing that my setup is not really 'mainstream', I wanted to know how the VDV21 worked and how I could best integrate it. A few days later, I received the device and promptly connected the WAN/Internet port to my internal switch. This was my preference because if I could get this to work, it would mean the VoIP system could take advantage of my dual-ISP redundancy. As expected, the adapter obtained an IP address from my DHCP server. I left the LAN port unused because I believe, in my scenario, it would not be needed since I am not using the "router" functionality of the VDV21, and simply using it as an ATA. I then connected an analog phone to the RJ-11 jack on the VDV21, and was able to immediately make an outbound call. Note- no inbound call is possible yet because my number transfer has not completed. Next order of business was to see if I could also plug the VDV21's LAN port into my network so I could manage the device via its embedded web server. Realizing it would not make sense to connect it into the same subnet, and fearing that the VDV21 was pre-configured as a DHCP server, which would potentially cause havoc by assigning IP addresses in an unwanted range to my internal hosts, I simply connected the LAN port to my laptop port. The laptop obtained 192.168.15.2/24 and I was able to manage the VDV21 device via HTTP. The above method of management was temporary, so I configured the VDV21 to allow remote management (meaning HTTP access from the outside to the WAN/Internet port) by checking the box in "Advanced Setup, Network Options, Remote Config Management". I was then able to assign a static IP address, add the VDV21 device (I made up a hostname) to my internal DNS server, and then administer by name. Now, before discussing QoS next, one observation that impressed me about Vonage is they support many different ATA models of different brands, and allow high flexibility with different options. For example, if the default 90Kbps call (which I found means the G.711u codec) eats up too much bandwidth, they allow the user to lower this to 50Kbps or 30Kbps via their web-based dashboard, among many other features. This not only makes their service more appealing to geeks like me, but provides for potential extra sales to small businesses that may have similar networks, even if customer tech support might be a bit more complicated for them. I applaud this strategy. Finally, being somewhat familiar with the requirement for QoS in converged networks, I set to determine how exactly to configure my switches and routers for voice and call signaling. Via Netflow, I saw that the voice-bearer traffic leaving the ATA going to the Internet (Vonage's datacenter in the New York area, as determined by a traceroute), already had a DSCP of EF. However, the return traffic was set to DSCP 0. And, I was unable to determine by just looking at Netflow data, what consisted of call signaling. I therefore used Wireshark on a SPAN port on the Catalyst switch to capture all traffic to/from the ATA and made a call. Here are the results. Some kind of SIP "REGISTER" keepalives every 20 seconds: UDP/SIP source port 1030 destination port 10000 with DSCP 0 The reply: UDP/SIP source port 10000 destination port 1030 with DSCP CS1 (DSCP CS1 in this case, I assume is one of the ISP's in the path, setting this traffic as scavenger, as per the Cisco SRND) The off-hook/on-hook/invite/dial-phone-number packets: UDP/SIP/SDP source port 1030 destination port 10000 with DSCP 0 ...again with DSCP CS1 for the return traffic. During the call, every second, the following occurs: UDP/RTCP source port 10057 destination port 14697 both outbound and return have DSCP 0. I am unsure if the last one (1-second during-call keepalives) would have a dynamic, different port if I made a second call. I'll have to setup another capture later. So, based on this, configuring QoS for Vonage, will mean classifying and marking the call signaling packets both ways (outbound towards the Internet when entering the Catalyst switch, and also when coming back from the Internet, entering the router), as well as the voice-bearer coming back from the Internet, when it enters the router. Then, assign bandwidth, shape, police, etc. by following some of the suggestions in the Cisco QoS SRND document. I'll post those configs after I've figured them out and they've been tested. UPDATE- June 1 2009 A few days after, I made more captures and discovered the REGISTER/INVITE/etc. source port varies, for example, instead of UDP 1030 like the other day, now, the 20 second interval messages, as well as the off-hook messages are sourced from 1048. So, I guess the ACL will have to match the destination port only. Also, the RTCP 1-second keepalive messages do vary as well. However, they are odd numbers, and I assume from reading the Vonage forums the range is from 10000 to 20000 (same as the RTP traffic). So, to recap, the RTP voice bearer traffic is a random port (even numbered) between 10000 and 20000, and the RTCP traffic is a random port (odd numbered, one up from the RTP port).

Tuesday, May 26, 2009

Cisco WAAS over DMVPN

I finally got my WAAS setup to work on my home network (the hub site) which consists of a dual headend (DSL, cable) dual-DMVPN and 3 spokes (the spokes are my other family members homes). In one of the spokes, I installed an edge WAE file engine, and at the hub, a core WAE and a third separate WAE central manager for administration. The motivation behind all this was to learn the technology but also to improve access to the Sharepoint site, other HTTP intranet sites, as well as RDP and SSH to/from that spoke. After installing the recovery CD on all 3 WAE's, for some reason, the the application traffic policy consisted of "N/A" but seemed to contain all the needed applications and settings. So, I didn't think much of it. But over time, savings were zero, and the graphs were not populating with any optimized traffic. I finally figured out to click the "Restore the default application policies" and "Restore the default application policies and classifiers". I now see reduction in web, SSH and WAFS traffic over the tunnel. Note- because each DMVPN subnet is a /25 and contains other non-WAAS sites, I configured a WCCP redirect ACL on the headend routers to ensure only traffic to/from the single spoke WAAS site gets redirected.

Tuesday, May 12, 2009

Cisco IOS Zone-Based Firewall vs CBAC

Having had some experience with PIX firewalls in the past, I was interested when I learned a similar firewall technology appeared in IOS. So, I've started reading about the newer zone-based firewall in IOS, and wanted to 'upgrade' from my well-working CBAC, if not only to simplify DMZ or guest LAN configurations. I found the lack of an equivalent "router-traffic" command to be a major inhibitor, especially because on the routers in question, NTP, Dynamic DNS and GRE/IPSec (DMVPN) are all connections that are initiated by the router, and even with a relaxed or no inbound ACL, this traffic has to be configured (allowed) from the self zone to the Internet. Anyway, by the time I had an almost fully working configuration, it was a lot lengthier than my original CBAC config. I think perhaps, if I had had more than a couple of DMZ's, the ZBF might have been worth it. But for a simple 3-zone [Internet Zone, internal Zone, and Guest Zone] setup on a small 800 series router, CBAC is still the way to go! If anyone knows when Cisco will improve/shorten the ZBF configs for such router-initiated traffic as above, I'd be anxious to find out!