Champion of Cyrodiil: Recover Openstack Ceph data with missing/no monitor(s)

Thursday, February 26, 2015

Recover Openstack Ceph data with missing/no monitor(s)

Recently ceph monitors got blown away. Along with this was all of the metadata associated with the monitors. Using this technique I was able to recover some of my data, but it was a lot of sleuthing

In the top left corner are the script running in a loop over all of the unique 'header' files from the various osds.

The main script is in the top right corner. Essentially we traverse the servers (nodes) and ceph osd instances throughout the cluster, collecting files (with find) that match the wildcard and are bigger than a byte.

The "wildcard" is the key, "13f2a30976b17" which is defined as replicated header file names for each rbd image on your ceph cluster. If you had 10 images, with 3 replicas, you would find 30 header files in your cluster, with identical names for the replicas. This would be okay, even if they are on the same server; because they are in separate osd data folders.

Using SSH we fetch a list of all the files on an osd instance and dump to a temp file. We do a cut on the slash(/) folder separator and dump a list of the files in a new file and remove the temp.

We then dump all the files into a csv, with the osd node location in column 1 and the file name in column 2. the -u switch only snags unique instances, so replicas are dropped.

We then execute a little script called scp obs. the tricky part here is the backslash in the ceph file names. use double quotes in the scp command and escape the \ with \\. So that's 3 slashes surrounded in double quotes w/ the scp command.

finally once we have all the object files. we 'dd' them together ans the final output.

Two quick notes,

in my cut command i use column #8 and #9. Thinking about it, this could give you a different result depending on where your osd data folder is. Mine is the default path, /var/lib/ceph/osd/ceph-0/current/4.4f/

For my convenience, at the end I mv the "raw" file to qcow2, since I know that is what these images are. This is based on the output of hexdump -C -n 4 -s 0 $first-block, where the first-block is the object with 16 zeroes. (the first block in the object group). It basically tells me the header of the first block which is 'QFI' for qcow2.

I even converted one of the qcow2 files to a VDI and booted in successfully in virtualbox.

The bash scripts can be found here:
https://github.com/charlescva/ceph-recovery

UPDATE:
It is the next morning, and I let my script run overnight. Check it out. :)

4 comments:

AnonymousAugust 31, 2015 at 4:07 PM
hello!
i use you script and get scp: /var/lib/ceph/osd/ceph-3/current/1.34_head/DIR_4: not a regular file

#!/bin/bash
set -e
read -p "HeaderWildcard: " WILDCARD
for i in "2";
do
#echo ****NODE$i****
for x in $(ssh -q proxmox$i ls /var/lib/ceph/osd);
do
#echo ****$x****
for y in $(ssh -q proxmox$i find /var/lib/ceph/osd/$x/current -type d -size +1b | grep _head);
do
ssh -q proxmox$i find $y -type f -name *$WILDCARD* >> proxmox$i.$x.data.files.tmp
done
cat proxmox$i.$x.data.files.tmp | cut -d "/" -f 8,9 >> proxmox$i.$x.data.files
rm proxmox$i.$x.data.files.tmp
done
done

for x in $(for i in $(ls *.files); do echo $i; done); do ./consolidate-stuff.sh $x; done
rm *.files
cat result.csv | sort -t "," -k2 -u > sorted.results
rm result.csv
./scp-obs.sh
cd test
for i in $(ls); do dd if=$i of=$WILDCARD.qcow2 bs=1024 conv=notrunc oflag=append; done
mv $WILDCARD.qcow2 ..
cd ...
rm -Rf test
ReplyDelete
Replies
UnknownDecember 14, 2016 at 4:26 PM
hello!

I lost my openstack ceph monitor and now I need do recover the vm images in ceph. I have 3 osd nodes: node-12 (osd.0); node-20 (osd.1); node-35 (osd.2 and osd.3). I would like to use your scripts for recovering the images. Could you help me? I have some questions:

* How can I find all "wildcards"?
* Is each different "wildcard" in ceph osd related with a different vm image in openstack?
* There is 6 scripts in your repository. How can I use them (execution order) to automate the recovery process of all images in my 3 nodes (4 OSDs).

Thank you very much in advance. I am looking for a solution since a long time.
ReplyDelete
Replies

Add comment

Search This Blog

Thursday, February 26, 2015

Recover Openstack Ceph data with missing/no monitor(s)

4 comments: