Search This Blog

Thursday, February 26, 2015

Recover Openstack Ceph data with missing/no monitor(s)

Recently ceph monitors got blown away.  Along with this was all of the metadata associated with the monitors.  Using this technique I was able to recover some of my data, but it was a lot of sleuthing



In the top left corner are the script running in a loop over all of the unique 'header' files from the various osds.

The main script is in the top right corner.  Essentially we traverse the servers (nodes) and ceph osd instances throughout the cluster, collecting files (with find) that match the wildcard and are bigger than a byte.

The "wildcard" is the key, "13f2a30976b17" which is defined as replicated header file names for each rbd image on your ceph cluster.  If you had 10 images, with 3 replicas, you would find 30 header files in your cluster, with identical names for the replicas.  This would be okay, even if they are on the same server; because they are in separate osd data folders.

Using SSH we fetch a list of all the files on an osd instance and dump to a temp file.  We do a cut on the slash(/) folder separator and dump a list of the files in a new file and remove the temp.

We then dump all the files into a csv, with the osd node location in column 1 and the file name in column 2.  the -u switch only snags unique instances, so replicas are dropped.

We then execute a little script called scp obs.  the tricky part here is the backslash in the ceph file names. use double quotes in the scp command and escape the \ with \\.   So that's 3 slashes surrounded in double quotes w/ the scp command.

finally once we have all the object files.  we 'dd' them together ans the final output.

Two quick notes,

in my cut command i use column #8 and #9.  Thinking about it, this could give you a different result depending on where your osd data folder is.  Mine is the default path, /var/lib/ceph/osd/ceph-0/current/4.4f/

For my convenience, at the end I mv the "raw" file to qcow2, since I know that is what these images are.  This is based on the output of hexdump -C -n 4 -s 0 $first-block, where the first-block is the object with 16 zeroes. (the first block in the object group).  It basically tells me the header of the first block which is 'QFI' for qcow2.

I even converted one of the qcow2 files to a VDI and booted in successfully in virtualbox.

The bash scripts can be found here:
https://github.com/charlescva/ceph-recovery

UPDATE:
It is the next morning, and I let my script run overnight. Check it out. :)


Wednesday, February 18, 2015

Administering Fuel with Openstack Juno Services

I have recently started using Openstack in an environment with 'production' value.  By this I mean that our Openstack instance is becoming a critical component of our business infrastructure; and at this point several development support services are tenants within it.

Openstack is not an easy solution.  Almost every core service is distributed, decentralized, and utilizes the full scope of their dependencies. This results in good news, and bad news.  The good news is that your infrastructure is so loosely coupled, that failures will USUALLY be localized to a specific process or configuration setting.  The bad news is, until you learn the terminology and components, you'll be running around like a mad man trying to find the various configs and error logs.

Ceph

First you will need to ensure your file system is stable.  Ceph has been with Openstack since for a long time.  Yes it is different than any other file system you're likely used to.  This means you'll have to learn something new.  One of the biggest issues with migration and spawning VMs can stem from failures to read/write RAW data to the distributed file system.

The best thing to do first, is read over this paper on RUSH, or replication under scalable hashing: http://www.ssrc.ucsc.edu/Papers/honicky-ipdps04.pdf.

The gist of this paper should help you to understand that Ceph clients in Openstack use the jenkins hash (http://en.wikipedia.org/wiki/Jenkins_hash_function) with a tree of weighted buckets (CRUSH Map, http://ceph.com/docs/master/rados/operations/crush-map/) and a map defaulting of 256 placement groups (http://ceph.com/docs/master/rados/operations/placement-groups/) to figure out where objects are stored.  Also that Ceph is not a file system, per say, but an "object store".  This means there is no central server the clients must negotiate with to read and write object data.  The Ceph documentation is phenomenal, and you should familiarize yourself with it is much as you can.  Most of your questions are answered in the documentation, you'll just need to be patient, read it all at a decent pace, and let the information resonate with your mind for a night before digging in to it again.  After a couple of days it will start to make more sense.  Here are some common commands to take a peak at:

  • ceph osd tree
  • ceph -w
  • ceph osd reweight (don't just run this randomly, understand what it does first)
Also keep in mind there have been bug reports regarding applying a new Crush map to a running cluster.  So spend a lot of time looking at a sample crush map in a test cluster before applying a new one.  It is likely that you can resolve a lot of your issues by using reweight and or modifying the number of replicas in largely used storage pools. like your Openstack volumes, images and compute pool for ephemeral storage

RBD (Rados Block Device)

RBD is used on top of the Ceph object store.  This provides the API Openstack uses to connect your volumes and images to the hypervisor you're using (Hopefully QEMU, because I like it and want it supported).  Here are some helpful commands:
  • rados df
  • rbd import
  • rbd export
  • rbd ls|rm|mv
  • qemu-img convert (although not rbd specific, relvent when dealing with RAW rbd images and compressing them to qcow2 for moving across the network)
In an earlier post on this blog, you will see my experience upgrading openstack.  In there you will see where I manually migrated each of my VMs from an Icehouse cluster to Juno.  I had some hardware constraints and it was tough, but in the end it worked very well. 

nova,cinder,glance CLI

You won't get by on the UI alone.  The bash command line for an openstack controller is your best tool.  Don't be afraid to poke around the databases on mysql for cinder, glance and nova.  Use the nova, glance and cinder tools with the 'help' argument and read the usage.  These tools are required to communicate with the API in a standardized way that is supported by the developers of Openstack.   If you're using 3rd party providers like Mirantis Fuel for Openstack, then you will need to use their documentation for maintaining Openstack environments.  Be advised, some of these 3rd party tools are lacking support and capability to perform some of the tasks you will need to know to properly maintain the environment.

Here are the ones to know:
  • nova boot
    • --availability-zone
    • --nic id
    • --flavor
    • flags for Volume or Image backed.
  • nova services-list
  • nova service-delete (Gets mention for not in Havana, but is in Juno!)
Seriously though, use mysql and don't be affraid to adjust the instances metadata.  Sometimes a VM is actually OFF, but the Horizon UI will show it as 'Shutting Down...' or 'Running'.  You can verify the status of your VM by SSHing into the compute node hosting the instance, and as root running:

# ps -ef | grep kvm

You'll see the instance id in the run command, as well as a bunch of other args.  Be advised, the domain.xml virsh uses is generated in code by python and uses the information in mysql to do so.  So modifying things like the video driver or video ram, require changes to the flavor and image metadata. I recently saw in Juno an option to nova boot with args passing metadata key values to set in the virsh domain, although I have not tried it yet.  I believe it is here: http://docs.openstack.org/cli-reference/content/novaclient_commands.html#novaclient_subcommand_boot, and the boot option appears to be --image-width .

Neutron

Neutron is a bit overwhelming.  Just know that the Open vSwitch service on your compute nodes handle the networking for the VMs running there.  Just because your L3 Agent(s) are down and you cannot get to the VM using it's public IP, does not mean that the VM is Off, it just means that the external connection isnt being routed.  Ensure all of these services are running and configured correctly.  This section is intentionally short because of the vast configuration options with neutron.
  • neutron list-agents
Last I need to thank the developers at Mirantis Fuel and others hanging out on the freenode IRC channel #fuel.  I could not have learned as much as I know at this point, without the help of a few users in there.  Thank you guys for your gracious support throughout my adoption of Openstack.

Monday, February 2, 2015

maven: Target server failed to respond

Was getting an exception with the maven (3.2.2) wagon plugin version 2.6.

Looking deeper, this plugin uses wagon-provider 2.6 which depends on httpclient 4.3.  This can be seen here: http://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-providers/2.6/wagon-providers-2.6.pom
and looks like this:


 ...  
 <dependencyManagement>  
 <dependencies>  
 <dependency>  
 <groupId>org.apache.httpcomponents</groupId>  
 <artifactId>httpclient</artifactId>  
 <version>4.3.1</version>  
 </dependency> ...  

The exception seen when running a maven build looks similar to this:

Caused by: org.apache.maven.wagon.providers.http.httpclient.NoHttpResponseException: The target server failed to respond

This bug: https://issues.apache.org/jira/browse/HTTPCLIENT-1531

indicates that there is a bug with httpclient in the versions 4.3 >= 4.3.4 & 4.4 Alpha1 when using a proxy (like apache httpd) between the client and server, without client authentication enabled.  The bug causes the no response exception later in the wagon plugin called via the WagonRepositoryConnector class.

Specifically, the bug occurs when authentication is disabled because as of 4.3.x the MainClientExec.java has a function to create tunnel to target, (createTunnelToTarget) that when authentication is disabled, the for loop does not exit properly and the request is never completed.

http://svn.apache.org/repos/asf/httpcomponents/httpclient/branches/4.3.x/httpclient/src/main/java/org/apache/http/impl/execchain/MainClientExec.java


In our case, just upgrading to 3.2.5 worked.   Another option is to include the lightweight http provider with wagon configuration.  this uses the Java HTTP libraries instead of apache's implementation.