Search This Blog

Wednesday, April 23, 2014

Thin Client Computing with Ubuntu Linux

I used Ubuntu 12.04 ALTERNATIVE ISO to perform the F4 Mode "LTSP Installation" during ISO boot.

LTSP is a Thin Client solution for Linux operating systems.  This was chosen because of the preferred use of Ubuntu Linux for development.  Benefits of LTSP are as follows:

·      Reduced Costs – Thin Clients require fewer resources than traditional Thick clients and therefore have a lower procurement cost.
·      No Licensing Fees – LTSP is open source software released under GPLv2 License.
·      Less Maintenance – Single point of control is the operating system image on the thin clients.
·      Security – LTSP clients are secured via SSH and are restricted to their own LAN.
·      LTSP Display Manager (LDM) – Python application for remote desktop SSH sessions.  KDM/GDM do not support remote SSH sessions.

Typical LTSP Layout

LTSP is typically run from a single server with two network cards that piggyback the LTSP isolated LAN and the larger network.  The LTSP Server uses NAT to provide connections between thin clients and the rest of the resources on the larger network.  This allows more control over the connections between developers and network systems and services.  Developers still have access to web services and network bound APIs they need, without necessarily having access to sensitive management protocols and systems.

LTSP supports a concept called ‘screen scripts’.  Multiple screen scripts can be run at the same time on different virtual consoles. (Ctrl + Alt + F[1-9])  User’s can toggle between screens while Screen six (or seven?) is reserved for the LDM.   Screen scripts can also be used to enable rdesktop for connecting to a Windows Server. 

LTS Configuration allows many custom configurations to be applied per client machine.  Here are some examples:
# Use nvidia driver for this thin client, overriding auto-detected driver
XSERVER = nvidia

# Set Screen 7 of this client to an RDP session rather than LDM
SCREEN_07 = “rdesktop”

A new feature of LTSPv5 is the ability to run linux applications installed on the chroot (“change root”, the image used by the thin clients) environment from within the LDM session.  This means reduced server load, enables use of graphics intense multimedia applications, and enables use of applications that require direct hardware access.  Drawbacks include increased chroot maintenance and increased hardware requirements on thin clients.

Local devices can also be supported with thin clients; so removable media such as CDROM and USB Flash drives can still be used on the thin clients.

Printers are supported and spooling is done on the server. No client-side print driver management required.

Sound is redirected from the server to the client using PulseAudio.  This network-aware client-server sound system can easily go through NAT firewalls.

Although not yet tested, LTSP also supports use of “Thick” clients.  Also known as Fat Clients.  These client machines would have a larger network block device root file system containing a complete OS with all desired additional programs (i.e. Chrome). Since processes are running on the client rather than the server, an admin cannot kill them from a central location.  Internet connectivity is provided directly to the client, so the client needs to be directed to an Internet gateway.

Wednesday, April 9, 2014

ORACLE initialization or shutdown in progress

Our database guy was trying to perform some operations on the oracle 11g database today and got the following error:

ORA-011033: ORACLE initialization or shutdown in progress

This could have a lot of underlying errors, but in our situation this occurs when the power is unexpectedly turned off to the server, causing the database transaction logs to not be closed properly.

Provided that you configured RMAN backups when you installed the instance,  (Which should have been rather apparent during the installation of the software and the creation of the database) recovery from this situation should be quite smooth.

I will highlight the commands manually entered.  Anything else is a response from command(s).

First, we log into the server using the oracle account you created before installing the database software.  Once you are logged into the operating system, on the console you will want to run sqlplus as the SYSDBA account, and properly shutdown the database. (It's probably stuck trying to initialize but cannot, since it will attempt to start on boot when the power came back on)  Then you will want to start the instance, manually mount the control files, but NOT open the database yet, just quit.
ORACLEBIR:/export/home/oracle$ sqlplus '/as sysdba'
SQL> shutdown abort
ORACLE Instance shut down.
SQL> startup nomount
ORACLE Instance started
SQL> alter database mount;
SQL> quit
At this point, your instance has been started and the database files have been mounted.  Now we run RMAN or the Recovery Manager.  Here we will request that RMAN performs a recovery.  If you get more errors or this does not work for you, it would seem your backup and recovery settings are not configured properly, and you are really in trouble.  Otherwise, it should look something like this:
ORACLEBOX:/export/home/oracle$ rman
RMAN> connect target
connected to target database: DBV2 (DBID=2494479496, not open)
using target database control file instead of recovery catalog
RMAN> recover database;
Starting recover at 09-APR-14
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=161 device type=DISK
starting media recovery
archived log for thread 1 with sequence 871 is already on disk as file /u1/app/oracle/oradata/dbv2/redo01.log
archived log file name=/u1/app/oracle/oradata/dbv2/redo01.log thread=1 sequence=871
media recovery complete, elapsed time: 00:00:07
Finished recover at 09-APR-14
RMAN> quit
If you see something like above, congratulations, your database has recovered.  Now, let's go back and actually open the database and reset the logs.  Also, make sure to start enterprise manager, and that your backup admin knows how to connect and use this tool to some degree.  Although some do not like the Enterprise Manager, it is essential to automate tasks for anyone who is not familiar with some of the basic operations.  And you might get hit by a bus, leaving all the work in your colleague's lap.
ORACLEBOX:/export/home/oracle$ sqlplus '/as sysdba'
SQL> alter database open resetlogs;
SQL> quit
ORACLEBOX:/export/home/oracle$ emctl start dbconsole
Oracle Enterprise Manager 11g Database Control Release
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting Oracle Enterprise Manager 11g Database Control .................... started.
Logs are generated in directory /u1/app/oracle/product/

Your EM Login:

Tuesday, April 8, 2014

Cloudera SCM Agent Error

"This host had been out of contact with Cloudera Manager for too long. The host's Cloudera Manager agent's software version could not be determined."

Today I saw this error pop up on the CM4 hosts monitor.  Running /etc/init.d/cloudera-scm-agent status only confirmed that the agent was running.  However I needed to review the logs to find the error.

The log for the agent is located at /var/log/cloudera-scm-agent/cloudera-scm-agent.log

The error reported looked like this:

[08/Apr/2014 15:58:09 +0000] 1228 MainThread agent        ERROR    Heartbeating to failed.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/", line 741, in send_heartbeat
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/", line 471, in __init__
  File "/usr/lib64/python2.6/", line 720, in connect
  File "/usr/lib64/python2.6/", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -5] No address associated with hostname

The problem was, that when the system rebooted, the file /etc/cloudera-scm-agent/config.ini was modified:

# Hostname of Cloudera SCM Server

The DNS server had an old host name entry for the IP address my Cloudera SCM Server was now using.  When the system restarted the agent, I believe a DNS lookup was performed using the IP and resolved the old host name.  My cluster uses /etc/hosts files to maintain name resolution, so I'm not 100% sure yet why this happened, but I speculate it is a result of the socket library in python, used by the cloudera SCM agent.

Resolved by changing the server_host value back to the host with the SCM server running on it.  Then restarted the cloudera-scm-agent service.