How to read attributes of a XML tag using easy shell script - Part 1


Many times we need to read a particular attributes value from a XML formatted file. Java's JAXB library is pretty useful for getting this done. But there are situations when we need to get such parsing done via SHELL SCRIPTS ...

There are thirty party tools like xml_grep by CPAN.org, however if you want to use the plain vanilla shell script, then there's nothing like the combination of cat, grep, awk and sed to get all you parsing done.

Steps:

1. cat the XML file in question
2. grep only those lines which are of interest
3. awk using a field separator Double Quote ( " ), since we consider that the attributes in an XML will in the format: attrName="attrValue"

Consider the below sample XML file:


<policy xmlns="http://www.sample.com/" name="object2_0_0" version="2.0.0" 
        valid="true" overwrite="true" schema-version="2.0.0">

    <Object fileName="demo.properties" propName="0" propValue="Active" 
            propValueType="string" propType="security" propToDelete="false" 
            isNew="No" defaultValue="ActiveUpdated" updated="false" 
            lastmodified="1422940044155" 
            UUID="ca9fce55-ecce-4182-86ce-51f182ef01c0" ObjType="property"/>
    <Object fileName="demo.properties" propName="lockscreen" propValue="true" 
            propValueType="string" propType="security" propToDelete="false" 
            isNew="Yes" defaultValue="def" updated="false" 
            lastmodified="1422940002964" 
            UUID="7d02c8cb-6484-423b-939d-4bd9840fbe5a" ObjType="property"/>
    <Object fileName="configResp.properties" propName="password.min.length" 
            propValue="10" propValueType="string" propType="custom" propToDelete="false" 
            isNew="Yes" defaultValue="10" updated="false" 
            lastmodified="1423023649175" 
            UUID="9dd50dcb-6d0c-4096-adb9-8431598ff708" ObjType="property"/>

</policy> 


Now if we consider that the attribute order inside an XML tag will not change, then extracting the value of a particular attribute can be done using a simple shell script:


#store current IFS
SAVEIFS=$IFS

#change the IFS
IFS=$(echo -en "\n\b")

#temp file with only those lines from the XML file, that we are
#interested in
propFile=/tmp/props.list
xmlFile=/etc/demo.xml

#################################################
# NOTE: This kind of parsing will fail,
# if the order of attributes inside the XML tag has changed.
##################################################
cat $xmlFile | grep "propName=" | awk '{print $2,$10,$4,$6}' FS='"' OFS=';'>$propFile

while read prop; do
    fileName=$(echo $prop | cut -f1 -d';')
    fileType=$(echo $prop | cut -f2 -d';')
    propName=$(echo $prop | cut -f3 -d';')
    propValue=$(echo $prop | cut -f4 -d';')
 
    echo $fileName $propName $propValue
done <$propFile

#restore the IFS
IFS=$SAVEIFS


Output(echo statement): 
demo.properties 0 Active
demo.properties lockscreen true
configResp.properties password.min.length 10

Output is in the format: fileName [space] propName [space] propValue
You can tweak the output format as per your requirement ...

Related Reading: How to read attributes of a XML tag using easy shell script - Part2
+