Monday, 18 November 2013

Real Time Scenario 1



There is an application running on cluster. Both the app and JVMs are UP and running fine. But when we hit the URL for accessing the application, it is showing page cannot be displayed. What are the troubleshooting steps to resolve the issue? I checked the systemout.log and systemerr.log but couldn’t find anything helpful?

The Flow of Request happens from
 
Load Balancer >> Web server >> App server >> DB
 

so the investigation for such issues should also follow the same routes. (Well that’s how I follow)

1) Try to check if the URL is responding from your end. (This will eliminate if it’s specific to a user or its error for every one).
2) Check the Web server if it’s running or not, if it’s not running start it
 
3) Check if the application server and application is running. If not running start it
 
4) Try to access the Application directly from the App server. I.e. using the http transport port
 i.e. wc_defalut. (this will identify if the error is due to App server or is with the Web server/plug-in)
5) check for the errors in the App server and web server logs
 
6) Check for the configuration in the plug-in file to ensure that the url which is being hit is available in the plugin-xml.cfg ( If it is not then it could be possible that the plug-in was not regenerated and propagated after the deployment)
6) Enable the trace in the plugin-xml.cfg to understand in the plug-in logs whether the plug-in is forwarding the request to the app servers or not .
7) Lastly also check if the page which you are requesting is available within the web modules

Which type of tickets will come?


How the tickets will come and what are the type’s tickets?

 In environment, if an application server is down, application is down, application server contains hung threads, CPU starvation, connection time out, web server down we will get tickets.

Depending upon the business impact the tickets will generate.
Tickets are generally categorized into 5 times
P1, P2, P3, P4, and P5.
If High number of users effecting or the business impact is more then we will get P1 ticket
For Ex: web server down.
If medium numbers of users are effecting or the business impact is less then we will get p2 ticket.
For Ex: An app server in a clustered environment is down.
If less number of users is effecting or the business impact is less then we will get p3.
For Ex: Users are getting 500 internal errors when they are accessing an application.
If the business impact is very less then will get p4.
For Ex: Disk space reached the threshold limit.
Generally P5 tickets will come for configuration changes.

Real Time Scenario 4 Application Having Performance Problem


What is Failure and Load balance?
If a request sent to an application and retrieved an error then it is call failure.
We will get so many requests to our application. In a clustered environment all the requests equally distributed, that is called Load Balancing.

Suppose we have one
 application having performance problem. I mean it is taking request time more how we will trouble shoot it? What are the log files we need to see?
Generally, poorely written code or data structures will create performance problem in an application. Then the performance will be degraded.
If there are too many firewalls presented from web server to application server will also create performance problem.
If application server is getting lost much of requests then also it will create performance problem.
We need to check the app. server jvm logs and application logs to check the problem and also we need to check the database logs to check the problem.
To troubleshoot this situation, we need to check all the above locations.

Suppose we have 10 applications in our environment? How the request goes to particular application? Could you please clear this?
A request will generally go to an exact application by the context root of that application.

Deployment in PROD Environment

In production env the deployments are usually script or autosys based.

This is to eliminate error while deployment and make the deployment faster

If the deployment is done graphically then procedure is the same as in other env

I.e. Install Application >> browse the ear >> next.. Next.. Next.. Finish :-) (Obviously you need to select the appropriate mapping, virtual host etc)

As a procedure per say it’s normally done this way

1) Take the backup of the old ear in both PROD and DR
2) Confirm with the AD where the ear staged is the latest one.
3) Network team flips the DNS to point to the DR
4) Do the deployment in DR
5) Check the logs and access the DR apps , test the application using the DR url
6) If the apps is working fine and its confirmed by the AD team then
7) Network Team to flip the PROD
8) Do the deployment in PROD
9) Check the logs and access the PROD apps , test the application using the PROD url

where  AD = Application Development
 
DR = Disaster Recovery

Real Time Scenario 5 - CPU utilization



Scenario: suppose in a cluster environment one of the servers is about to
reach 100% CPU utilization, remaining servers are in normal utilization


if this is a scenario then how it will occur and what is the reason for that case
and how to resolve these kind things?



I have come cross same situation. Where one of the JVM is taking high CPU 97% other is less CPU.

When I looked at the System out log, there are lot of threads are in hung state.

Solution:
1. Take the 3 thread dumps in interval of 1-2 mins. (Kill -3 PID)
 
2. Kill that process (kill -9 PID)
 
3. Restart the server/JVM
4. Analyze the dumps.

What is Collector Tool and when you use it ?



Someone asked me how to run collector tool. IBM Doc says, run the tool not from APPSERVER_INST_PATH/bin, but run from working directory. What exactly is that?

The collector tool gathers information about your Web Sphere Application Server installation and packages it in a Java archive (JAR) file that you can send to IBM Customer Support to assist in determining and analyzing your problem. Information in the JAR file includes logs, property files, configuration files, operating system and Java data, and the presence and level of each software prerequisite.

Collector command - summary option

Web Sphere Application Server products include an enhancement to the collector tool beginning with Version 5.0.2, known as the collector summary option.

The collector summary option helps you communicate with WebSphere Application Server technical staff at IBM Support. Run the collector tool with the -Summary option to produce a lightweight text file and console version of some of the information in the Java archive (JAR) file that the tool produces without the -Summary parameter. You can use the collector summary option to retrieve basic configuration and prerequisite software level information when starting a conversation with IBM Support.

The collector summary option produces version information for the WebSphere Application Server product and the operating system as well as other information. It stores the information in the Collector_Summary.txt file and writes it to the console. You can use the information to answer initial questions from IBM Support or you can send the Collector_Summary.txt file directly to IBM Support.

Collector tool collect key information including ffdc, configuration, logs, and so on and makes a jar file with all the information. So, depending on your configuration and all, jar file may occupy lot of space. So, IBM docs says that, create a folder out of appserver installation path.
Example,
mkdir /tmp/collector
cd /tmp/collector

Then run the script as needed
/appserver/instpath/profiles/profilename/bin/collector.sh
or
/appserver/instpath/profiles/profilename/bin/collector.sh