Ahoy there! This is my blog in which I jot down some of my experiences in IT (stuff related to my job and other random IT stuff). Hope you find something useful. My primary fields of interest in IT are Korn/Bash Shell Scripting, web/middleware/database technologies (ZXTM, Apache, WebLogic Server, Oracle, etc.), IT Operations Management, ITIL and UNIX (any flavour/distribution).

Solutions Archives

Runaway process causes 100% disk utilization

Problem:

A Solaris 9 mountpoint was 100% utilized (as per “df”) and no new files could be added.

df output:

cybergavin@myhost:/dashboard> df –h /dashboard
Filesystem             size   used  avail capacity  Mounted on
/dev/vx/dsk/A19278-S01-7uitx-dg/dashboard
                        16G    16G   2.1M   100%    /dashboard

du output:

cybergavin@myhost:/dashboard> du –sk /dashboard
1789259 /dashboard

Background & Analysis:

As you can see above, both “du” and “df” provide significantly different metrics for the utilization of /dashboard. The “df” output tells me that I have very little free space (~ 2.1 MB) whereas the “du” output indicates that I have around 14 GB free space.

Well, first and foremost, df and du intend to give you disk usage stats, but they do not work in the same way. Refer this article to understand the differences between df and du.

Secondly, the mountpoint /dashboard was mounted on a VxFS. The dmesg output showed the following:

Feb  1 09:29:00 myhost vxfs: [ID 702911 kern.notice] NOTICE: msgcnt 112748 mesg 001: V-2-1: vx_nospace -  /dev/vx/dsk/A19278-S01-7uitx-dg/dashboard file system full (1 block extent)

An explanation for the above (quite obvious) message is given in this Symantec article.

I found a runaway background process (iostat –x 2) running for the past 2 months. It was a process launched by a shell script. The shell script exited, but the process wasn’t killed. The process was redirecting its output to a file and that file was also deleted. Consequently, the process’ stdout file descriptor (1) was not closed and the process was still writing to stdout. This caused the space occupied by the stdout to be hidden. To determine how much space is actually being used by the process when writing to stdout, try the following command (<pid> = process id):

 

ls -l /proc/<pid>/fd/1

 


Solution:

Killed the runaway process and the mountpoint utilization dropped significantly to 14%. Further, df and du outputs correlated.

Root Cause:

A runaway process was consuming most of the disk space and this disk space consumption was “hidden” because the file to which the process’ stdout was being redirected, was deleted.

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.6.5_908]
Rating: +1 (from 1 vote)

Problem:

Messaging Bridge on WebLogic Server 8.1 does not start. Following errors seen in server log:

####<Jan 27, 2010 10:10:13 AM GMT> <Info> <MessagingBridge> <myhost> <managed1> <ExecuteThread: ‘4′ for queue: ‘MessagingBridge’> <<WLS Kernel>>
<> <BEA-200032> <Bridge "MyBridge" is configured to disallow degradation of its quality of service in cases where the configured quality of service is unreachable.>
####<Jan 27, 2010 10:10:13 AM GMT> <Error> <MessagingBridge> <myhost> <managed1> <ExecuteThread: ‘4′ for queue: ‘MessagingBridge’> <<WLS Kernel>> <> <BEA-200025> <Bridge "MyBridge" failed to start, because the quality of service configured (Exactly-once) is unreachable. This is likely due to an invalid configuration or adapter limitations.>
####<Jan 27, 2010 10:10:13 AM GMT> <Info> <MessagingBridge> <myhost> <managed1> <ExecuteThread: ‘4′ for queue: ‘MessagingBridge’> <<WLS Kernel>> <> <BEA-200034> <Bridge "MyBridge" is shut down.>

           

NOTE: MyBridge connects a WebLogic JMS destination (source) to an MQ destination (target).

Background & Analysis:

In order for Messaging Bridges on WebLogic 8.1 to use Exactly-Once QOS, the following requirements must be met:

  • Messaging Bridge adapter must be jms-xa-adp.rar and its JNDI name is eis.jms.WLSConnectionFactoryJNDIXA.
  • Connection Factories for source Bridge destinations must be XA-enabled.
  • Connection Factories used for target Bridge Destinations must be XA-enabled.
  • Messaging Bridges must be configured with Exactly-Once QOS.
  • The “QOS Degradation Allowed” checkbox must be unchecked.

With the above, it is recommended that the Messaging Bridges be Synchronous for better performance (fewer transaction commits).

From the log snippet above, you can see that the Messaging Bridge MyBridge could not start because the QOS (Exactly-Once) was unreachable and the Bridge was not allowed to degrade its QOS.

The QOS will typically be unreachable due to adapter, bridge configuration or bridge destination configuration issues as referred to in the log snippet.

Solution:

Enabled XA on the ConnectionFactory for the target Bridge Destination on MQ.

In order to satisfy the Exactly-Once QOS, both source and target destination connection factories must be XA-enabled.

Root Cause:

The ConnectionFactory for the target Bridge Destination on MQ was configured as non-XA, thereby preventing the Messaging Bridge from initiating an XA connection from the WebLogic Bridge destination to the MQ Bridge destination. Since the Messaging Bridge was not allowed to lower its QOS to make the connection, it failed to start properly.

 NOTE: By allowing QOS degradation, the MessagingBridge will connect to the MQ destination even if the ConnectionFactory for the Bridge Destination on MQ were non-XA. However, the choice of QOS must be driven by business requirements and not by technical workarounds.

 Reference: Oracle Documentation

NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.6.5_908]
Rating: +4 (from 4 votes)

Problem:

WebLogic Administration Server (with WebLogic Integration) 8.1 does not start. Following errors seen in stdout/stderr/server logs:

####<Jan 25, 2010 4:30:26 PM GMT> <Error> <JDBC> <myhost> <myadmin> <main> <<WLS Kernel>> <> <BEA-001151> <Data Source "cgDataSource" deployment failed with the following error: null.>

####<Jan 25, 2010 4:30:26 PM GMT> <Info> <JDBC> <myhost> <myadmin> <main> <<WLS Kernel>> <> <BEA-001156> <Stack trace associated with message 001151 follows:

weblogic.common.ResourceException

        at weblogic.jdbc.common.internal.DataSourceManager.createDataSource(DataSourceManager.java:264)

####<Jan 25, 2010 4:30:30 PM GMT> <Error> <WLW> <myhost> <myadmin> <main> <<WLS Kernel>> <> <000000> <Failed to obtain connection to datasource=cgDataSource, using generic DB properties>

####<Jan 25, 2010 4:30:31 PM GMT> <Error> <WLW> <myhost> <myadmin> <main> <<WLS Kernel>> <> <000000> <Error in startup class com.bea.wli.store.DocumentStoreSetup Method: init:

java.lang.IllegalStateException: Unable to start DocumentStore:  com.bea.wli.store.DocumentStoreException: Could not find SQL Document Store cgDataSource

            .

            .

Background & Analysis:

WebLogic Integration (WLI) is a software Business Process Integration framework that runs on WebLogic Server. WLI also includes a console application (wliconsole) to manage WLI configuration. This console application is deployed on the WebLogic Administration Server. Since the console application interacts with the database, it uses default data sources and connection pools (e.g. cgDataSource and cgConnectionPool) for database connectivity.

The errors above indicate that the cgDataSource failed to deploy and consequently, a startup class could not obtain connections to the database, thereby failing deployment and preventing the Administration Server from starting.

Data Sources use Connection Pools to obtain database connections.

Solution:

Ensure that the Connection Pool for the cgDataSource is configured properly (correct JDBC driver, URL, credentials, etc.) and targeted/deployed on the Administration Server (not just the cluster).

Root Cause:

The connection pool (cgConnectionPool) for the data source cgDataSource was not deployed on the Administration Server.

 

NOTE:
(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.
(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.6.5_908]
Rating: -1 (from 1 vote)

Problem:

HPjmeter 4.0 console does not launch after installation on Windows. When trying to launch it, the Command window briefly appears and then disappears without launching the console.

 

Background:

HPjmeter 4.0 console, a component of HPjmeter 4.0, is used to view performance data and analyze profiling and GC log files for a JVM. HPjmeter 4.0 console, being developed in java, can be executed on any platform supporting Java. However, the console is currently compatible with only JDK 5 and JDK 6. When you launch the installer (hpjmeter_console_4.0.00_windows_setup.exe), it will first scan your HDDs for installed JDKs and I guess it will pick up the first JDK it finds (see screenshot below).

hpjmeter_installer

I have both JDK 1.4.2_11 and JDK 6u17 on my HDD. Although only JDK 6u17 meets HPjmeter 4.0 console’s requirements, the installer selected JDK 1.4.2_11 (must have found this JDK first!) and completed the installation. Well, the installer will not even tell you which JDK it has found and used for the installation. You can determine this information only by checking the hpjmeter.bat file in the bin directory within the installation directory. The JDK used by the installer will be the value of the variable JM_JAVA_HOME in the hpjmeter.bat file. So, for my installation, JM_JAVA_HOME indicated that HPjmeter 4.0 console was using JDK 1.4.2_11, thereby not meeting the software requirements.

 

The HPjmeter 4.0 console installer should either search and select only a compatible JDK or prompt you for the location of a compatible JDK if it cannot find one. Although the installer isn’t too smart, the product (HPjmeter 4.0) is a very good and robust performance analysis tool for JVMs.

 Solution:

Ensure you have a working JDK 5 or JDK 6 installation. Ensure that the value of JM_JAVA_HOME in the hpjmeter.bat file is the location of the JDK 5 or JDK 6 installation.

 

Root Cause:

HPjmeter 4.0 console installer detected and used an incompatible JDK during installation. Currently, the HPjmeter 4.0 console is compatible with only JDK 5 and JDK 6.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: +2 (from 2 votes)

Problem:

Solaris hosts indicate high cpu utilization caused by Sun’s CST (cstd.agt) and Net Connect (srsproxy) processes.

 

Background:

Configuration and Service Tracker (CST) and Net Connect are tools provided by Sun Microsystems for proactive system management at a customer site. The software processes launched by these tools run on SOlaris hosts and regularly send data to Sun Microsystems to enable Sun to track system availability and performance and continually improve Sun’s products and services. cstd.agt and srsproxy are processes belonging to CST and Net Connect respectively. While there have been problems/patches for CST and  problems/patches for Net Connect related to high cpu utilization, note that both these tools have reached their EOL and Sun no longer supports them. Instead, Sun has replaced these tools with the Services Tools Bundle (STB).

 

 Solution:

Remove CST and Net Connect software from your Solaris host(s) or replace CST and Net Connect with STB.

 

Root Cause:

CST and Net Connect have reached their EOL and are no longer supported. Hence, using them could lead to high CPU utilization problems.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: 0 (from 0 votes)

Problem:

WebLogic10_ActivateChanges

 

When activating changes (clicking the button "Activate Changes" as shown in the image on the left) on the Administration console of a WebLogic 10.0 MP1 domain comprising an admin server and two managed servers (each managed server on a different host), it took around 5 minutes for the activation to complete.

 

 

 

 

Background:

From WebLogic Server versions 9.x and later, any changes performed on the Administration console must go through a three-step process – (1) Lock and Edit (2) Edit config (3) Activate Changes. It’s the third step in this process that took about 5 minutes to complete. The changes were successfully made, albeit after 5 minutes. Interestingly, when we located all the managed servers in the domain on the same host, this problem disappeared and the activation of changes took less than 10 seconds. However, locating all managed servers on one host cannot be a solution. We enabled debug for Deployment on all servers. Given below is the output of the debug during occurrence of the problem:

 

####<Sep 29, 2009 10:56:45 AM BST> <Debug> <Deployment> <myhost> <myadmin> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1254218205661> <BEA-000000> <Experienced Exception while c.tryLock() and it is ignored :: java.nio.channels.OverlappingFileLockException
at sun.nio.ch.FileChannelImpl.checkList(FileChannelImpl.java:853)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:820)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:967)
at weblogic.deploy.internal.targetserver.datamanagement.ConfigDataUpdate.getFileLock(ConfigDataUpdate.java:374)
at weblogic.deploy.internal.targetserver.datamanagement.ConfigDataUpdate.getFileLock(ConfigDataUpdate.java:357)
at weblogic.deploy.internal.targetserver.datamanagement.ConfigDataUpdate.acquireFileLock(ConfigDataUpdate.java:338)
.
.
.

 

Solution:

After liasing with Oracle Support, we upgraded our JVM and the upgrade resolved the problem. After the upgrade, the activation of changes took less than 10 seconds irrespective of whether the managed servers were located on the same host or not. Details of the upgrade are given below:

Old JVM:

java version “1.5.0_14″
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
BEA JRockit(R) (build R27.5.0-110_o-99226-1.5.0_14-20080528-1505-linux-x86_64, compiled mode)

 

New JVM:
java version “1.5.0_17″
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_17-b04)
BEA JRockit(R) (build R27.6.3-40_BR8141840-120019-1.5.0_17-20090828-1133-linux-x86_64, compiled mode)

 

Root Cause:

Bug in JVM 1.5.0_14 (JRockit R27.5.0)

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: +3 (from 3 votes)

Problem:

The Administration server of a WebLogic domain comprising WebLogic Server 10.0 and WebLogic Integration 10.2, consumes high CPU and throws java.lang.OutOfMemory errors.

 

Background:

The WebLogic Domain’s admin server had only two web applications deployed on it – the WebLogic Administration console and WebLogic Integration console. After start-up, its CPU utilization gradually increased and reached around 80% within a couple of days. Also, java.lang.OutOfMemory errors were observed in the server logs. This behaviour was observed even when there was no load on the managed servers and the web applications on the admin server were not accessed (all servers idle from a user perspective).

WebLogic Domain details:

Version: WebLogic Server 10.0 MP1, WebLogic Integration 10.2
JVM: JRockit R27.5.0-110 (JRE Standard Edition build 1.5.0_14-b03)
Admin Server JVM Heap: Minimum (Xms) = Maximum (Xmx) = 2 GB
Number of managed servers: 2
Operating System: 64-bit Red Hat Enterprise Linux 5.1
CPU Architecture: AMD64

 

Solution:

The following patches were applied and the problem was resolved. Contact Oracle support or use their Smart Update procedure to obtain the patches.

SL# PATCH COMMENTS
1. D76T CR380997 Admin server gives OOM: Closed the Queue and Session Objects properly.
2. LJTR CR373884 Unable to apply some of the patches for jpd.jar when using "inject" mechanism
3. ZSX5 BUG8174387 MEMORY LEAK OBSERVED ON ADMIN SERVER: No public details available. Patch provided for WLI 10.2

 

Root Cause:

Known issues with WebLogic Integration 10.2

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: +2 (from 2 votes)

log4j: Could not find resource

Problem:

A software’s log4j framework could not find the log4j configuration file which I specified. Details with debug below:

 

Command:

java -Dlog4j.debug=true -Dlog4j.configuration=/mysoftware/config/mylog4j.properties ...

 

Debug Output:


log4j: Trying to find [mysoftware/config/mylog4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@164b95.
log4j: Trying to find [mysoftware/config/mylog4j.properties] using sun.misc.Launcher$AppClassLoader@164b95 class loader.
log4j: Trying to find [mysoftware/config/mylog4j.properties] using ClassLoader.getSystemResource().
log4j: Could not find resource: [mysoftware/config/mylog4j.properties].

 

Background:

I did not want to use the default location for the log4j configuration file and wanted to be able to specify to a file with a name and location of my choice.

 

Solution:

The debug output indicates that the absolute path of the resource (log4j.properties) is missing a beginning “/”. That’s because the file:// protocol was not specified. So, the following worked and enabled log4j locate the properties file:

 

java -Dlog4j.debug=true -Dlog4j.configuration=file:///mysoftware/config/mylog4j.properties ...

NOTE: If your log4j configuration file is called log4j.xml or log4j.properties, then placing it in the application classpath will suffice and you will not need the -Dlog4j.configuration option.

 

 

Root Cause:

The file:// protocol was not used to specify the log4j configuration file.

 

Reference:

Short Introduction to log4j – Ceki Gülcü

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: +1 (from 1 vote)

Problem:

When using WLShell 2.1.0.1 to connect to a WebLogic server, the connection fails and the following error is displayed:

couldn’t find or load connector class: wlshell.connect.jmx.weblogic.Connector for protocol: t3 check the libraries required by this connector are in the classpath. use the "ver" and "info -v" commands to display the current classpath.

 

Background:

In order for WLShell 2.1.0.1 to start properly and connect to a WebLogic server, the following must be present in the CLASSPATH: weblogic.jar, wlshell-2.1.0.1.jar, wlshell-2.1.0.jar and log4j-1.2.8.jar

My CLASSPATH had all the above jars, but I still received the error.

WebLogic was installed as a user called "bea" and I was running WLShell as my user (saturg). Both users were not part of the same group, but I ensured that weblogic.jar was accessible and readable by user saturg.

 

Solution:

As I did not have privileges to make both users bea and saturg part of the same group, I executed the following command as the bea user in the WebLogic installation root directory:

 

find . -type f | xargs -i chmod 744 {}

 

So, basically, I ensured that all files in the WebLogic installation were accessible and readable by all users on the host.

Note:The recommended method is to make both the weblogic installation user and WLShell user part of the same group, thereby restricting access to the WebLogic installation

 

Root Cause:

The error message is misleading as the class wlshell.connect.jmx.weblogic.Connector is available in wlshell-2.1.0.jar. The solution above indicates that WLShell requires to access other jars in the WebLogic installation (apart from weblogic.jar) or rather classes in weblogic.jar require to access other jars in the WebLogic installation.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to all problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: 0 (from 0 votes)

WebLogic – IP Multicast : A primer

In Oracle WebLogic (formerly BEA WebLogic) versions prior to version 10.0, WebLogic Servers relied on IP multicast to ensure cluster membership (versions 10.0 and later provide the alternative of Unicast which is preferred over Multicast). This article will pertain to IP multicast used by WebLogic.

What is IP multicast?

IP multicast is a technology used to broadcast data (or datagrams) across a network using IP. For IP multicasting, certain special IP addresses called multicast addresses are defined. According to the Internet Assigned Numbers Authority (IANA), its RFC3171 guidelines specify that addresses 224.0.0.0 to 239.255.255.255 are designated as multicast addresses. A multicast address is associated with a group of receivers. When a sender wishes to send a datagram to a group of receivers using IP multicast, it will send the datagram to the multicast address/port associated with that group of receivers. When routers or switches on the network receive the datagram, they know which servers (receivers) are associated with the multicast address (using IGMP) and so they make copies of the datagram and send a copy to every registered receiver. This is illustrated in the figure below:

 

HowMulticastingWorks

 

Why does WebLogic use IP multicast?

A WebLogic Cluster is a group of WebLogic servers which provides similar services, with resilience (if a server crashes and exits the cluster, it can rejoin the cluster later), high availability (if a server in the cluster crashes, other servers in the cluster can continue the provision of services) and load balancing (the load on all servers in a cluster can be uniformly distributed) for an application deployed on a WebLogic cluster. WebLogic makes these beneficial clustering features possible by using IP Multicast for the following:

(1) Cluster heartbeats: All servers in a WebLogic cluster must always know which servers are part of the cluster. To make this possible, each server in the cluster uses IP multicast to broadcast regular "heartbeat" messages that advertise its availability.

(2) Cluster-wide JNDI updates: Each WebLogic Server in a cluster uses IP multicast to announce the availability of clustered objects that are deployed or removed locally.

 

How does WebLogic use IP multicast?

All servers in a WebLogic Cluster send out multicast fragments (heartbeat messages) from their interface addresses to the multicast IP address and port configured for the WebLogic Cluster. All servers in the cluster are registered with the multicast address and port and so every server in the cluster receives fragments from all other servers in the cluster as well as the fragment it sent out. So, since every server in the cluster sends out fragments every 10 seconds, based on the fragments it receives, it can determine which servers are still part of the cluster. If a server (say Server A) does not receive a fragment from another server in the cluster within 30 seconds (3 multicast heartbeat failures), then it will remove that server from its cluster membership list. When fragments from the removed server start arriving at Server A, then Server A will add the removed server back to its cluster membership list. This way, every server in a WebLogic cluster maintains its own cluster membership list. Regarding cluster-wide JNDI updates, each server instance in the cluster monitors these announcements and updates its local JNDI tree to reflect current deployments of clustered objects.

Note: Clustered server instances also monitor IP sockets as a more immediate method of determining when a server instance has failed.

The figure below illustrates how a WebLogic cluster uses IP multicast.

 

HowWebLogicUsesIPMulticast

 

How do you configure and test multicast for a WebLogic Cluster?

Configuring IP Multicast for a WebLogic Cluster is simple. The steps required are given below:

STEP 1: If your WebLogic cluster is part of a network containing other clusters, obtain a multicast address and port for it, from your Network Admins. Understandably, a multicast address and port combination is unique for every WebLogic cluster. Several WebLogic clusters may share the same multicast address if and only if they use different multicast ports. Typically, in organizations, network admins allocate multicast addresses and ports for various projects to ensure there are no conflicts across the network. By default, WebLogic uses a multicast address of 237.0.0.1 and the listen port of the Administration server as the multicast port.

 

STEP 2: Having obtained a multicast address and port for your WebLogic cluster, you must test them before starting your WebLogic cluster to ensure that there are no network glitches and conflicts with other WebLogic clusters. You may do so with the MulticastTest utility provided with the WebLogic installation (part of weblogic.jar). An example test for a cluster containing 2 WebLogic servers on UNIX hosts and using multicast address/port of 237.0.0.1/30000 is given below:

# Command to run on both server hosts (any one of the following within the WebLogic domain directory)
# to set the WebLogic domain environment
. ./setDomainEnv.sh
. ./setEnv.sh
 
# Command to run on server 1 (within any directory)
${JAVA_HOME}/bin/java utils.MulticastTest -N server1 -A 237.0.0.1 -P 30000
 
# Command to run on server 2 (within any directory)
${JAVA_HOME}/bin/java utils.MulticastTest -N server2 -A 237.0.0.1 -P 30000
 
# NOTE: Both java commands must be run on both WebLogic server hosts concurrently.

 

View screenshots of the tests executed (on Windows Vista) when the WebLogic cluster was running (conflicts between test and running cluster outlined in red) and when the WebLogic cluster was stopped, by clicking on the images below:

 

utils.MulticastTest with cluster running - click to view     utils.MulticastTest with cluster stopped - click to view

 

Note: On Vinny Carpenter’s blog, he mentions a problem when using the utils.MulticastTest utility bundled with WebLogic Server 8.1 SP4. Well, I have never faced any issues with the utils.MulticastTest utility, but I am not sure if I’ve used it with WLS 8.1 SP4.

 

STEP 3: After successfully testing the multicast address and port, you may use the WebLogic Administration Console to configure multicast for the cluster. Descriptions of various multicast parameters are available on the console itself. The three most important parameters are (1) Multicast Address, (2) Multicast Port and (3) Interface Address. The Interface Address may be left blank if the WebLogic servers use their hosts’ default interface. On multi-NIC machines or in WebLogic clusters with Network channels, you may have to configure an Interface Address. Given below is a screenshot from a WLS 8.1 SP6 Administration Console indicating the various multicast parameters that may be configured for a cluster. Note that the interface address is on a different screen as it is associated with each server in the cluster, rather than the cluster itself.

 

ConfigureMulticast

 

After configuring Multicast for a WebLogic cluster, you can monitor the health of the cluster and exchange of multicast fragments among the servers in the cluster by using the WebLogic Administration console. A screenshot of such a monitoring screen with WLS 8.1 SP6 is given below:

 

Monitoring a WebLogic Cluster using the Administration Console

 

Note that the screenshot above indicates that:

(1) All servers are participating in the cluster ("Servers" column).

(2) Every server in the cluster is aware of every other server in the cluster. The "Known Servers" column is especially useful for large clusters to know exactly which servers are not participating in the cluster.

(3) The total number of fragments received by each server (34) is equal to the sum of all the fragments sent by all the servers in the cluster (17 + 17). Note that the "Fragments Sent" and "Fragments Received" columns on the console need not always indicate a correct relationship even if multicast works fine. That’s because these stats on the console are reset to 0 when servers are restarted.

 

Troubleshooting WebLogic’s Multicast configuration

When you encounter a problem with WebLogic multicast (or any problem for that matter), it is important to confirm the problem by executing as many tests as possible and gather as much data as possible when the problem occurs. For WebLogic multicast, you may confirm the problem by using the MulticastTest utility or checking the Administration console as described above. To troubleshoot WebLogic multicast, refer to the Oracle documentation. Also, check the section below to determine if the problem you’ve encountered is similar to one of the problems described, to provide you with a quick resolution.

 

WebLogic Multicast eureka!

Given below are WebLogic multicast problems which I’ve encountered and investigated, along with solutions that worked:

 

PROBLEM 1:

SYMPTOMS: All WebLogic servers could not see any other server in the cluster. Tests using the MulticastTest utility failed indicating that all servers could only receive the multicast fragments which they sent out.

ANALYSIS: The MulticastTest utility was tried with the correct multicast address, multicast port and interface address. No conflict with any other cluster was observed, but no messages were received from other servers. Assuming that all servers in the cluster are not hung, the symptoms indicate a problem with the underlying network or the multicast configuration on the network.

SOLUTIONS:

Solution 1: The Network Admin just gave us another multicast address/port pair and multicast tests worked. The multicast address/port pair which failed was not registered correctly on the network.

Solution 2: The Network Admin informed us that more than one switch was used on the cluster network and this configuration did not ensure that multicast fragments sent by one server in the cluster were copied and transmitted to other servers in the cluster. Refer to this CISCO document for details regarding this problem and its solutions. As a tactical solution, the Network Admin configured static multicast MAC entries on the switches (Solution 4 in the CISCO document). This tactical solution requires the Network Admin to maintain those static entries, but since there weren’t too many WebLogic clusters using multicast on the network, this solution was chosen.

Solution 3: The two managed servers in a cluster were in geographically separated data centres and several hops across the network were required for the servers to receive each other’s multicast fragments. Increasing the multicast TTL solved this problem and both the MulticastTest utility and the WebLogic servers successfully multicasted.


PROBLEM 2:

SYMPTOMS: The following errors were seen in the WebLogic managed server logs and the managed servers did not start.

Exception:weblogic.server.ServerLifecycleException: Failed to listen on multicast address
weblogic.server.ServerLifecycleException: Failed to listen on multicast address
        at weblogic.cluster.ClusterCommunicationService.initialize()V(ClusterCommunicationService.java:48)
        at weblogic.t3.srvr.T3Srvr.initializeHere()V(T3Srvr.java:923)
        at weblogic.t3.srvr.T3Srvr.initialize()V(T3Srvr.java:669)
        at weblogic.t3.srvr.T3Srvr.run([Ljava/lang/String;)I(T3Srvr.java:343)
        at weblogic.Server.main([Ljava/lang/String;)V(Server.java:32)
Caused by: java.net.BindException: Cannot assign requested address
        at jrockit.net.SocketNativeIO.setMulticastAddress(ILjava/net/InetAddress;)V(Unknown Source)
        at jrockit.net.SocketNativeIO.setMulticastAddress(Ljava/io/FileDescriptor;Ljava/net/InetAddress;)V(Unknown Source)
        .
        .
        .

ANALYSIS: The errors occured irrespective of whichever multicast address/pair was used. The error indicates that the WebLogic server could not bind to an address to send datagrams to the multicast address. i.e. it could not bind to its Interface Address

SOLUTION: The WebLogic server host was a multi-NIC machine and another interface had to be specified for communication with the multicast address/port. Specifying the correct interface address solved the problem.


PROBLEM 3:

SYMPTOMS: The following errors were seen in the WebLogic managed server logs. The managed servers were running, but clustering features (like JNDI replication) were not working.

####<May 20, 2008 4:00:58 AM BST> <Error> <Cluster> <kips1host> <kips1_managed1> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1211252458100> <BEA-000170> <Server kips1_managed1 did not receive the multicast packets that were sent by itself>
####<May 20, 2008 4:00:58 AM BST> <Critical> <Health> <kips1host> <kips1_managed1> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1211252458100> <BEA-310006> <Critical Subsystem Cluster has failed. Setting server state to FAILED.
Reason: Unable to receive self generated multicast messages>

ANALYSIS: The errors above indicate that the WebLogic server kips_managed1 could not receive its own multicast fragments from the multicast address/port. Probably, the server’s multicast fragments did not reach the multicast address/port in the first place and this points to an issue with the configuration of the interface address or route from interface address to multicast address/port or multicast address/port (most likely the interface address or route as if the multicast address/port were wrong, the server would not have received multicast fragments from any other server, as in PROBLEM 1).

SOLUTION: The WebLogic server kips_managed1 used -Dhttps.nonProxyHosts and -Dhttp.nonProxyHosts JVM parameters in its server startup command and these parameter values did not contain the name of the host which hosted kips_managed1. After including the relevant hostname in these parameter values, the errors stopped occurring. I am not sure how these HTTP proxy parameters affected self-generated multicast messages (will try to investigate this).


PROBLEM 4:

SYMPTOMS: All WebLogic servers were often (not always) part of a cluster and intermittently, servers in the cluster were removed and added and the LostMulticastMessageCount was increasing for some servers in the cluster. However, tests using the MulticastTest utility (when the cluster was stopped) were successful.

ANALYSIS: The problem occurred intermittently when the WebLogic servers were running, but never occurred when using the MulticastTest utility. This indicates that the underlying IP multicast works fine and something is preventing the servers in the cluster from IP multicasting properly. Further analysis revealed that the servers had issues with JVM Garbage Collection with long stop-the-world pauses (> 30 secs) during which the JVM did absolutely nothing else other than garbage collection. Also, the times of occurrences of these long GC pauses correlated with the times of increases in LostMulticastMessageCount and removal of servers from the cluster.

SOLUTION: The JVMs hosting the WebLogic servers were tuned to minimize stop-the-world GC pauses to allow the servers to multicast properly. For the specific GC problem I encountered, you may refer to the tuning details here.


Reference: Oracle documentation

 

NOTE:

Your rating of this post will be much appreciated. Also, feel free to leave comments (especially if you have constructive negative feedback).

VN:F [1.6.5_908]
Rating: +10 (from 10 votes)

Problem:

When upgrading my blog to Wordpress 2.8.4, the upgrade failed with the following error:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 2357046 bytes) in xxx.php on line yyy

 

Background:

I wanted to upgrade my blog to the latest Wordpress version (2.8.4). Also, I was using 10 plugins on my Wordpress blog, the latest addition being GD Star Rating 1.6.4.

 

Solution:

I upgraded GD Star Rating to version 1.6.5 and this upgrade fixed the problem, thereby permitting me to upgrade Wordpress to version 2.8.4. However, after both upgrades, my blog’s dashboard displayed the fatal error in 2 locations as shown in the screenshot below:

WP2.8_memoryissues

Such fatal errors occur when a PHP script hits the threshold for the maximum amount of memory it may consume. Some Wordpress forums indicate that Wordpress 2.8 is more memory-intensive than earlier wordpress versions. And my hosting provider defined a memory_limit of 32MB in php.ini.

 

In order to override my PHP system memory_limit of 32 MB and allow the Wordpress application to use more memory, I edited the wp-config.php file (in server docroot) and added the following:

 

define('WP_MEMORY_LIMIT', '128M');

 

After I modified and saved wp-config.php, the fatal errors disappeared from my Wordpress Dashboard.

 

Some more investigation revealed how Wordpress sets its memory limit in wp-settings.php via the following code:

 

if ( !defined('WP_MEMORY_LIMIT') )
	define('WP_MEMORY_LIMIT', '32M');
 
if ( function_exists('memory_get_usage') && ( (int) @ini_get('memory_limit') < abs(intval(WP_MEMORY_LIMIT)) ) )
	@ini_set('memory_limit', WP_MEMORY_LIMIT);

 

So, that’s why setting the WP_MEMORY_LIMIT variable in wp-config.php (you could set it in wp-settings.php also, but it’s recommended to consolidate all config in one file) increases the memory limit for the Wordpress application.

 

Root Cause:

The PHP memory limit of 32MB was too low for the Wordpress 2.8.4 application.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: +5 (from 5 votes)

Problem:

When executing a korn shell script, execution fails with the following error:

/bin/ksh^M: bad interpreter: No such file or directory

 

Background:

I transferred the executable korn shell script from my backup on my Windows server to a RHEL 5.1 host. On Linux, opened the file using the vi editor and did not find any ^M characters. Also, checked file with "set list" in vi.

 

Solution:

I used dos2unix on the file as follows:

dos2unix myscript.ksh

Note: Usually, improper file transfer modes in ftp can cause this problem. However, I used ascii mode and checked the file using the vi editor, but didn’t find anything abnormal. So, dos2unix is your best bet!

 

Root Cause:

Invalid file format after transferring it from a Windows machine to a UNIX host.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: +5 (from 5 votes)

svn commit fails with HTTP 400 error

Problem:

When performing an svn commit using TortoiseSVN, the commit operation failed with the following error:

Server sent unexpected return value (400 Bad Request) in response to MKACTIVITY

 

Background:

I accessed my SVN repository (http://) using TortoiseSVN via a proxy server and tried to commit some changes.

 

Solution:

I disabled the proxy server in Tortoise SVN’s Settings. i.e. I did NOT use a proxy server to access my SVN repository.

Note: If you must use a proxy server, you may fix this problem by installing certificates and accessing your SVN repository via HTTPS.

 

Root Cause:

TortoiseSVN uses WebDAV to access an SVN repository. Proxy servers which are not compliant with WebDAV will strip off HTTP headers containing WebDAV methods such as MKACTIVITY. However, if you access the SVN repository via HTTPS and a proxy, the proxy server will not inspect the request and will forward it as is.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

VN:F [1.6.5_908]
Rating: 0 (from 0 votes)

Problem:

When trying to install the WebLogic 8.1 SP6 binary (platform816_linux32.bin) on 64-bit Ubuntu 9.04 Desktop, the following errors were observed:

oracle@mrkips-laptop: ./platform816_linux32.bin
oracle@mrkips-laptop: ./platform816_linux32.bin: No such file or directory

 

Background & Analysis:

The first couple of checks (obvious) for such an error are:

(1) Check if the file exists in the appropriate location

(2) Check if the file has the required permissions (read/execute)

The above checks were successful and so the error was misleading.

(3) Using the file command, the following was observed:

oracle@mrkips-laptop:/mrkips/oracle$ file platform816_linux32.bin 
platform816_linux32.bin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 2.0.0, stripped

The above output indicates that the WebLogic installer binary is 32-bit and uses 32-bit shared libraries.

(4) The host OS was 64-bit Ubuntu and this could be confirmed using the uname command:

oracle@mrkips-laptop:/mrkips/oracle$ uname -a
Linux mrkips-laptop 2.6.28-13-generic #45-Ubuntu SMP Tue
 Jun 30 22:12:12 UTC 2009 x86_64 GNU/Linux

32-bit shared libraries are not available by default on 64-bit Ubuntu.

 

Solution:

Get the 32-bit shared library package for use on amd64 and ia64 systems, ia32-libs, using any one of the following methods:

sudo apt-get install ia32-libs

               OR

Launch the Synaptic Package Manager from the Ubuntu Desktop and install the ia32-libs package along with any required dependencies.

 

Root Cause:

The 32-bit shared libraries required for the installation of the 32-bit WebLogic binary installer were not available by default on 64-bit Ubuntu 9.04 Desktop.

 

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.6.5_908]
Rating: +2 (from 2 votes)

DNS caching in JVMs

Problem:

A Java application running on a WebLogic Server uses Messaging Bridges to send messages to JMS destinations on a remote server. The target destinations were configured with the domain name of the remote server. When the remote server changed its IP address and modified DNS entries to map its domain name to the new IP address, the Messaging Bridges could no longer connect to the Remote Server.

 

Background & Analysis:

When a Java application looks up a domain name, the result of the domain name resolution is cached in the java.net.InetAddress class. By default, for Sun Hotspot JVM 1.5.x and earlier, a successful domain name resolution is cached forever (yikes!) and an unsuccessful domain name resolution is cached for 10 seconds. By default, for Sun Hotspot JVM 1.6.x, a successful domain name resolution is cached for 30 seconds (huh, why even bother caching positives only for a few seconds?) and an unsuccessful domain name resolution is cached for 10 seconds.

 

Which parameters control domain name resolution caching?

Either of the following sets of parameters control caching for successful (+ve) and unsuccessful (-ve) domain name resolutions are:

1. networkaddress.cache.ttl and networkaddress.cache.negative.ttl

OR

2. sun.net.inetaddr.ttl and sun.net.inetaddr.negative.ttl

The above parameters represent the time-to-live (expiry period) for java.net.InetAddress cache entries, in seconds. Parameters in (1) are recommended, as parameters in (2) are Sun proprietary parameters and may be removed in future releases.

How to configure the domain name resolution caching parameters?

The networkaddress.cache.ttl and networkaddress.cache.negative.ttl parameters can be configured by setting their values in the ${JAVA_HOME}/jre/lib/security/java.security file.

The sun.net.inetaddr.ttl and sun.net.inetaddr.negative.ttl can be configured by passing them as command-line options to the startup command for the JVM (-Dsun.net.inetaddr.ttl=30)

Sample Code to determine values (especially useful for determining default values):

1
2
3
4
5
6
7
8
9
10
public class DNScacheTTL
{
  public static void main(String[] args)
   {
     System.out.println(”networkaddress.cache.ttl = 
sun.net.inetaddr.ttl =+sun.net.InetAddressCachePolicy.get());
     System.out.println(”networkaddress.cache.negative.ttl 
= sun.net.inetaddr.negative.ttl =+sun.net.InetAddressCachePolicy.getNegative());
   }
}

The above code will throw warnings when compiled with certain JVMs because it uses Sun’s proprietary methods. You may ignore these warnings.

Examples: Code above inserted in DNSCacheTTL.java, compiled and then executed as follows on Sun JVM 1.6:

java DNScacheTTL

networkaddress.cache.ttl = sun.net.inetaddr.ttl = 30

networkaddress.cache.negative.ttl = sun.net.inetaddr.negative.ttl = 10


java -Dsun.net.inetaddr.ttl=100 DNScacheTTL

networkaddress.cache.ttl = sun.net.inetaddr.ttl = 100

networkaddress.cache.negative.ttl = sun.net.inetaddr.negative.ttl = 10


 

Solution:

 

The WebLogic Servers were restarted to clear the java.net.InetAddress caches. Consequently, the messaging bridges looked up the new domain name mapping for the remote server and connected to its target destination successfully.

Root Cause:

 

Domain Name lookup caching in the JVM was configured with default parameters. Hence, old domain name < –- > IP address mapping was cached forever.

 

Tips for setting caching parameters:

 

It’s easy to be tempted to set the networkaddress.cache.ttl value to 0 to turn off caching. However, note that Sun wants you to cache for two reasons – (1) Prevent DNS spoofing and (2) Improve performance. DNS Spoofing will not be an issue if the systems involved communicate over a secure network. So, assuming your systems are part of a secure network, trade-offs must be made between improved performance and service impact (when remote interfaces change – should be a rare occurrence) with caching. If your application is time-critical and cannot even withstand a latency/unavailability of a couple of minutes (e.g. synchronous call with user waiting for response), then turn off caching. However, if your application can tolerate delays by a few minutes or more (e.g. asynchronous message transfers), then you may set the value of networkaddress.cache.ttl to something like 300 (5 minutes). In such a scenario, if a remote system changes its domain name < – > IP address mapping, then your application will connect to the new IP address after 5 minutes, in the worst case. This also ensures that under normal circumstances, if your application makes several connections to the remote server within a 5 minute period, it will lookup the domain only once (assuming successful lookup) in 5 minutes, thereby improving performance. Refer to the table below for some example scenarios. So, hope you get the idea and set networkaddress.cache.ttl suitably. For almost all purposes, the default value of 10 seconds for networkaddress.cache.negative.ttl will suffice.

 

Application Latency Requirement Recommended value for networkaddress.cache.ttl
Low-latency/time-critical 0 (no caching)
Moderate Latency (tolerance of a few minutes) 300 (or depending on latency tolerance)

While considering the above guidelines for setting JVM caching parameters, you must also take into account DNS caching by the Operating System and network devices.

NOTE:

(1) The solution above describes a successful problem-solving experience and may not be applicable to other problems with similar symptoms.

(2) Your rating of this post will be much appreciated. Also, feel free to leave comments.

 

VN:F [1.6.5_908]
Rating: +5 (from 5 votes)

 Page 1 of 2  1  2 »