This post is merely a log of a particular troubleshooting experience initiated by CM IPMEDPRO alarms.

My CM6 system recently notified me of a set of alarms:

Alarm Report
============
Port       Maintenance On   Alt     Alarm   Svc    Ack?  Date
           Name        Brd? Name    Type    State  1 2   Alarmed
02A13      IPMEDPRO    y            MINOR          y    02/05/18:17
02A13      IPMEDPRO    y            MINOR          y    02/05/18:17
02A1302    MEDPROPT    y            WARNING OUT         02/05/18:17
02A1301    MEDPROPT    y            WARNING OUT         02/05/18:17

Logging into SAT, I determined that the alarms were still active:

display alarms

                                 ALARM REPORT

Port      Mtce      On   Alt          Alarm   Svc   Ack? Date        Date
          Name      Brd? Name         Type    State 1 2  Alarmed     Resolved

02A13     IPMEDPRO  y                 MINOR         y    02/05/18:17 00/00/00:00
02A13     IPMEDPRO  y                 MINOR         y    02/05/18:17 00/00/00:00
02A1302   MEDPROPT  y                 WARNING OUT        02/05/18:17 00/00/00:00
02A1301   MEDPROPT  y                 WARNING OUT        02/05/18:17 00/00/00:00

When there are alarms, there are generally corresponding errors that provide more information. We’ll look for all errors in port-network 2:

display errors                                                  Page   1 of   1
                                  ERROR REPORT

      The following options control which errors will be displayed.
         ERROR TYPES

            Error Type:               Error List: active-alarms


         REPORT PERIOD

            Interval: a      From:   /  /  :   To:   /  /  :

         EQUIPMENT TYPE ( Choose only one, if any, of the following )
                       Media Gateway:
                             Cabinet:
                        Port Network: 2
                        Board Number:
                                Port:
                            Category:
                           Extension:
              Trunk ( group/member ):     /

And get this result:

display errors

                     HARDWARE ERROR REPORT - ACTIVE ALARMS

Port      Mtce     Alt             Err   Aux    First/Last   Err Err Rt/ Al Ac
          Name     Name            Type  Data   Occurrence   Cnt Rt  Hr  St

02A1302   MEDPROPT                 1025         02/05/18:17  255 10  11  a  y
                                                02/06/19:08
02A1301   MEDPROPT                 1025         02/05/18:17  255 10  11  a  y
                                                02/06/19:08
02A13     IPMEDPRO                 1025  131    02/04/10:32  9   0   0   a  y
                                                02/05/18:20
02A13     IPMEDPRO                 1793         02/05/18:17  254 10  6   a  y
                                                02/06/19:08

By looking at the listed port numbers, I see that board 02A13 is an IPMEDPRO, and it has two ports in error (02A1301 and 02A1302). Troubleshooting a fault in CM generally starts with a board, rather than a port – particularly if the board itself is in error; port errors could be subsidiary errors.

Avaya publishes manuals detailing error types, codes, tests, etc. This particular switch is CM 6.3; we therefore want to refer to Maintenance Alarms for Avaya Aura┬« Communication Manager, Branch Gateways and Servers, Release 6.3. You can find the most recent publication applicable to your switch by logging into https://support.avaya.com using your Avaya SSO account, and searching for “Maintenance Alarms” while filtering for your switch release.

After finding IPMEDPRO in the index, we find that The IPMEDPRO maintenance object applies to the TN2302 IP Media Processor and the TN2602AP IP Media Resource 320 circuit packs. To determine which circuit pack we have that is in fault, we list configuration for the board in error:

list configuration board 2a13

                              SYSTEM CONFIGURATION

Board                                                     Assigned Ports
Number   Board Type              Code     Vintage    u=unassigned t=tti p=psa

02A13    IP MEDIA PROCESSOR      TN2602AP HW28 FW061 01 02

Now we can instead look up “IPMEDPRO (TN2602AP IP Media Resource 320)”. The manual provides a list of error log entries and recommended course of action, when possible. Using the above “display errors” output, we find more information for the thrown errors:

Error Type 1025: a module on the board failed. Aux Data values between 16641 and 16895 indicate a critical problem. See Board Health Query Test (#1652).

Error Type 1793: no electrical signal is detected on the Ethernet cable. The Ethernet cable is unplugged or there is a problem with a connection to the network interface.

It appears that error 1025 is a fairly generic “there’s an on-board module that has failed”, but 1793 gives us something to investigate.

CM is able to test equipment on-demand, so that you can see which tests have failed (and resulted in the aforementioned errors):

test board 02a13 long                                                   Page   1

                                 TEST RESULTS

Port        Mtce Name   Alt. Name        Test No.  Result           Error Code

02A13       IPMEDPRO                     52        PASS
02A13       IPMEDPRO                     1402      PASS
02A13       IPMEDPRO                     1371      PASS
02A13       IPMEDPRO                     1383      FAIL
02A13       IPMEDPRO                     1379      FAIL             2805
02A13       IPMEDPRO                     1506      ABORT
02A13       IPMEDPRO                     1511      PASS
02A13       IPMEDPRO                     1405      PASS
02A13       IPMEDPRO                     1629      PASS
02A13       IPMEDPRO                     1652      FAIL             16745
02A13       IPMEDPRO                     1630      ABORT            1115
02A13       IPMEDPRO                     1680      PASS
02A1301     MEDPROPT                     1382      PASS
02A1301     MEDPROPT                     1380      ABORT
02A1301     MEDPROPT                     1407      ABORT            1
02A1302     MEDPROPT                     1382      PASS
02A1302     MEDPROPT                     1380      ABORT
02A1302     MEDPROPT                     1407      ABORT            1

We can see that several demand tests failed. I’ll cite text from each failed test’s error documentation from the aforementioned manual.

DSP Query Test (#1382) failed with no error code.

The DSP failed. If it continues to fail, it will be taken out of service.

Ping Test (#1379) failed with error code 2805.

The number of pings received did not match the number sent (normally one ping sent). This means that no ping responses were received from the gateway defined on the ip-interface form for the IP Media Processor.
1. Retry the command at 1-minute intervals up to 3 times.

Board Health Query Test (#1652) failed with error code 16745.

The board has a critical error and will be taken out-of-service. Check that the other tests for the board pass. If they do no t:
1. Attempt to reset the circuit pack.
2. Rerun the test. If the problem continues, replace the circuit pack.

This tells me that there is a failure, likely hardware in nature, that is causing this circuit pack to be unable to communicate with the local IP network. Let’s determine what IP address it is assigned, so that we can see if the board responds to ping (the reverse direction of the ping it previously tried).

display ip-interface 02a13                                      Page   1 of   3
                                  IP INTERFACES

                           Critical Reliable Bearer? y
                  Type: MEDPRO
                  Slot: 02A13                           Slot: 02A14
           Code/Suffix: TN2602                   Code/Suffix: TN2602
      Enable Interface? y                   Enable Interface? y
                  VLAN: n                               VLAN: n
        Network Region: 2
         VOIP Channels: 320

                                 IPV4 PARAMETERS
             Node Name: [rdctd]2a13               IP Address: 10.2.7.14
   Duplicate Node Name: [rdctd]2a14               IP Address: 10.2.7.15
     Gateway Node Name: [redacted]                IP Address: 10.2.7.1
           Subnet Mask: /22

                              IPV4 COMMON ATTRIBUTES
        Shared Virtual Node Name: [rdctd]virt     IP Address: 10.2.7.17
               Virtual MAC Table: 1
             Virtual MAC Address: 02:04:0d:4a:53:c1
display ip-interface 02a13                                      Page   2 of   3
                                  IP INTERFACES

                                 ETHERNET OPTIONS
                  Slot: 02A13                           Slot: 02A14
                  Auto? n                               Auto? n
                 Speed: 100Mbps                        Speed: 100Mbps
                Duplex: Full                          Duplex: Full
display ip-interface 02a13                                      Page   3 of   3
                                  IP INTERFACES

  VOIP/NETWORK THRESHOLDS
  Enable VoIP/Network Thresholds? n

Noting that the IPMEDPROs are paired as 10.2.7.14 and 10.2.7.15, we will try to ping both of them:

$ ping -c 5 10.2.7.14
PING 10.2.7.14 (10.2.7.14): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3

--- 10.2.7.14 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss

$ ping -c 5 10.2.7.15
PING 10.2.7.15 (10.2.7.15): 56 data bytes
64 bytes from 10.2.7.15: icmp_seq=0 ttl=62 time=0.999 ms
64 bytes from 10.2.7.15: icmp_seq=1 ttl=62 time=0.896 ms
64 bytes from 10.2.7.15: icmp_seq=2 ttl=62 time=0.946 ms
64 bytes from 10.2.7.15: icmp_seq=3 ttl=62 time=1.049 ms
64 bytes from 10.2.7.15: icmp_seq=4 ttl=62 time=1.070 ms

--- 10.2.7.15 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.896/0.992/1.070/0.064 ms

This seems to confirm CM’s report that the board is offline.

Let’s see if a busyout/release helps:

busyout board 2a13

                             COMMAND RESULTS

  Port      Maintenance Name  Alt. Name          Result           Error Code

  02A13     IPMEDPRO                             PASS
  02A1301   MEDPROPT                             PASS
  02A1302   MEDPROPT                             PASS
release board 2a13

                             COMMAND RESULTS

  Port      Maintenance Name  Alt. Name          Result           Error Code

  02A13     IPMEDPRO                             PASS
  02A1301   MEDPROPT                             PASS
  02A1302   MEDPROPT                             PASS

And after busyout/release, let’s try testing the board again:

test board 2a13 long                                                   Page   1

                                 TEST RESULTS

Port        Mtce Name   Alt. Name        Test No.  Result           Error Code

02A13       IPMEDPRO                     52        PASS
02A13       IPMEDPRO                     1402      PASS
02A13       IPMEDPRO                     1371      PASS
02A13       IPMEDPRO                     1383      FAIL
02A13       IPMEDPRO                     1379      FAIL             2805
02A13       IPMEDPRO                     1506      ABORT            2100
02A13       IPMEDPRO                     1511      PASS
02A13       IPMEDPRO                     1405      PASS
02A13       IPMEDPRO                     1629      PASS
02A13       IPMEDPRO                     1652      FAIL             16745
02A13       IPMEDPRO                     1630      ABORT            1115
02A13       IPMEDPRO                     1680      PASS
02A1301     MEDPROPT                     1382      PASS
02A1301     MEDPROPT                     1380      ABORT
02A1301     MEDPROPT                     1407      ABORT            1
02A1302     MEDPROPT                     1382      PASS
02A1302     MEDPROPT                     1380      ABORT
02A1302     MEDPROPT                     1407      ABORT            1

No change. This is the second time that this alarm has been thrown in the last 30 days. Last time, we swapped out the TN2602AP with an identical spare, and the replacement immediately came in-service. The error recurring leads me to believe that the problem is not a fault of the circuit-pack, but instead another component that may have failed.

When troubleshooting problems like these, I generally tend to work in a progression such that I check for administrative/configuration issues, then use software diagnostics (i.e., busyout, test board, etc), then bypass/replace/mitigate hardware in something resembling an order of likelihood of failure. In this case, the order of the hardware in my perceived likelihood of failure:

  1. Cable between far-end IP switchport and CrossFire Adapter
  2. The board itself
  3. The CrossFire Adapter (connects the ethernet cable to the G650)
  4. IP switchport
  5. Port-network/G650 itself (TDM bus, internal cabling, etc)

Except that we’ll strike #2 from this list, because it was recently put into service as a result of a failure appearing to be identical to this one.

Since that equipment is at a remote site, and there is nobody at the remote site, and interruption of the remote site will not cause a noteworthy outage, let’s see what happens if we test the TDM bus. ==A TDM TEST WILL CAUSE A SERVICE INTERRUPTION FOR EVERYTHING IN THE PORT-NETWORK.==

test tdm port-network 2                                             Please Wait

                                 TEST RESULTS

Port        Mtce Name   Alt. Name        Test No.  Result           Error Code

PN 02A      TDM-BUS                      294       PASS
PN 02A      TDM-BUS                      296       PASS
PN 02A      TDM-BUS                      297       ABORT            1005
PN 02B      TDM-BUS                      294       PASS
PN 02B      TDM-BUS                      296       ABORT            1005
PN 02B      TDM-BUS                      297       PASS

The TDM bus passed, but I noticed that my station’s alarm lamp for MINOR shut off. Let’s check that:

display alarms

                                 ALARM REPORT

Port      Mtce      On   Alt          Alarm   Svc   Ack? Date        Date
          Name      Brd? Name         Type    State 1 2  Alarmed     Resolved

02A13     IPMEDPRO  n                 WARNING            02/06/19:38 00/00/00:00
02A13     IPMEDPRO  n                 WARNING            02/06/19:38 00/00/00:00
02A1302   MEDPROPT  y                 WARNING OUT        02/06/19:38 00/00/00:00
02A1301   MEDPROPT  y                 WARNING OUT        02/06/19:38 00/00/00:00
02A13     IPMEDPRO  n                 WARNING            02/06/19:38 00/00/00:00
test board 2a13

                                 TEST RESULTS

Port        Mtce Name   Alt. Name        Test No.  Result           Error Code

02A13       IPMEDPRO                     52        PASS
02A13       IPMEDPRO                     1371      PASS
02A13       IPMEDPRO                     1383      FAIL
02A13       IPMEDPRO                     1379      FAIL             2805
02A13       IPMEDPRO                     1505      ABORT            2806
02A13       IPMEDPRO                     1511      PASS
02A13       IPMEDPRO                     1405      PASS
02A13       IPMEDPRO                     1629      PASS
02A13       IPMEDPRO                     1630      ABORT            1115
02A13       IPMEDPRO                     1680      PASS
02A1301     MEDPROPT                     1407      ABORT            1
02A1302     MEDPROPT                     1407      ABORT            1

Nope, the fault still exists. The switch merely reclassified it.

The next point of troubleshooting that will be least-effort is to replace the ethernet cable between the IP switchport and the CrossFire Adapter.

At this point, I went to the site housing port-network 2.

  • Observed no link-light on IP switch.
  • Observed red LED at top of 2a13.

Re-seated board 2a13.

  • Observed no link-light on IP switch.
  • Observed red LED at top of 2a13.

Replaced ethernet cable between CrossFire Adapter and IP switchport.

  • Observed no link-light on IP switch.
  • Observed red LED at top of 2a13.

Re-checked IP switchport configuration. It’s on a Cisco Catalyst. (I also did this prior to coming on-site, but forgot to mention it.)

interface FastEthernet1/0/5
 description CONNECTION TO AVAYA G650 2a13
 switchport access vlan 2
 speed 100
 duplex full
 mls qos trust dscp
 spanning-tree portfast

The switchport is correctly configured, per the above display ip-interface 02a13 output.

Let’s try bouncing the IP switchport:

switch(config)#int fa1/0/5
switch(config-if)#shutdown
switch(config-if)#no shutdown
switch(config-if)#exit
switch(config)#exit
  • Observed no link-light on ip switch.
  • Observed red LED at top of 2a13.

At this point, I’ve replaced the ethernet cable, and re-seated the board. The next thing in my order list of potentially failed hardware is the CrossFire Adapter. I removed the existing CrossFire Adapter, and replaced it with a spare.

  • Observed no link-light on ip switch.
  • Observed red LED at top of 2a13.
test board 2a13

                                 TEST RESULTS

Port        Mtce Name   Alt. Name        Test No.  Result           Error Code

02A13       IPMEDPRO                     52        PASS
02A13       IPMEDPRO                     1371      PASS
02A13       IPMEDPRO                     1383      FAIL
02A13       IPMEDPRO                     1379      FAIL             2805
02A13       IPMEDPRO                     1505      ABORT            2806
02A13       IPMEDPRO                     1511      PASS
02A13       IPMEDPRO                     1405      PASS
02A13       IPMEDPRO                     1629      PASS
02A13       IPMEDPRO                     1630      ABORT            1115
02A13       IPMEDPRO                     1680      PASS
02A1301     MEDPROPT                     1407      ABORT            1
02A1302     MEDPROPT                     1407      ABORT            1

That also didn’t help.

The next item on the list to check is the IP switchport. I’m using a Catalyst in this case; the switchports are not easily swappable. Instead, I’ll configure a nearby port such that I can move the cable onto it:

Changing IP switchport from 1/0/5 to 1/0/32

interface FastEthernet1/0/32
 description TESTING BOARD 02A13
 switchport access vlan 272
 speed 100
 duplex full
 mls qos trust dscp
 spanning-tree portfast

Moved cable from Fa1/0/5 to Fa1/0/32.

Feb  6 20:30:31.228: %LINK-3-UPDOWN: Interface FastEthernet1/0/32, changed state to up
Feb  6 20:30:32.234: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet1/0/32, changed state to up
  • Observed immediate link light on the IP switch
  • Observed green lights blinking on board 2a13
display errors

                     HARDWARE ERROR REPORT - ACTIVE ALARMS

Port      Mtce     Alt             Err   Aux    First/Last   Err Err Rt/ Al Ac
          Name     Name            Type  Data   Occurrence   Cnt Rt  Hr  St
test board 2a13

                                 TEST RESULTS

Port        Mtce Name   Alt. Name        Test No.  Result           Error Code

02A13       IPMEDPRO                     52        PASS
02A13       IPMEDPRO                     1371      PASS
02A13       IPMEDPRO                     1383      PASS
02A13       IPMEDPRO                     1379      PASS
02A13       IPMEDPRO                     1505      ABORT            2806
02A13       IPMEDPRO                     1511      PASS
02A13       IPMEDPRO                     1405      PASS
02A13       IPMEDPRO                     1629      PASS
02A13       IPMEDPRO                     1630      PASS
02A13       IPMEDPRO                     1680      PASS
02A1301     MEDPROPT                     1407      PASS
02A1302     MEDPROPT                     1407      PASS

The board immediately came in service, the errors and alarms immediately cleared, and the board passed the test that I demanded.

The only action still needed was to make the appropriate changes to the Catalyst, so that my changes are reflected (and hopefully nobody tries to use Fa1/0/5 in the future):

interface FastEthernet1/0/5
 description PHYSICAL FAULT. DO NOT USE.
 shutdown
!
interface FastEthernet1/0/32
 description CONNECTION TO AVAYA G650 02A13
 switchport access vlan 272
 speed 100
 duplex full
 mls qos trust dscp
 spanning-tree portfast
!

I honestly have no idea whether or not this post will be useful to anyone – please let me know. This really didn’t take me any time to write (<20 minutes). I grab most of the above “screen captures” whenever I’m troubleshooting anyway, so that I have them to refer to immediately afterward. (I recommend that you do the same.)