I've got a few thousand xml files with different formats I'm trying to convert to CSV. I've noticed that in Excel you can open an XML file 'as an XML table', and it will create a schema based on the XML source data. Is there a way to use this Excel functionality in Powershell so I can automate the conversion of these files?
So far, all my research ends up at places where you need to know the schema to parse the files, which is not helpful given the volume I'm working with.
Here is a sample:
<?xml version='1.0' encoding='UTF-8'?>
<results exportTime="2016-02-11 21:44:04 CST"> <report label="S Report 5x" slug="L211"> <record reporting_date_end="03/28/2014" narrative="null"> <report label="Current Volume"> <record central_volume_loads="166.79" zone1_volume="233.26" zone3_volume="17.52" zone2_volume="48.94" zone1_volume_pounds="6,671,777" zone4_volume_pounds="9,330,260" zone2_volume_pounds="700,694" zone3_volume_pounds="1,957,789"/> </report> <report label="Zone1"> <record imps_percentage="92-94%" total_units="794,007" total_loads="19.85" weighted_average="274.76"/> <record imps_percentage="90%" total_units="1,091,657" total_loads="27.29" weighted_average="264.35"/> <record imps_percentage="85%" total_units="3,467,452" total_loads="86.69" weighted_average="228.20"/> <record imps_percentage="81%" total_units="12,501" total_loads=".31" weighted_average="228.24"/> <record imps_percentage="75%" total_units="195,690" total_loads="4.89" weighted_average="187.30"/> <record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/> <record imps_percentage="65%" total_units="1,110,470" total_loads="27.76" weighted_average="166.04"/> </report> <report label="Zone2"> <record imps_percentage="92-94%" total_units="891,007" total_loads="22.28" weighted_average="275.26"/> <record imps_percentage="90%" total_units="1,813,306" total_loads="45.33" weighted_average="265.04"/> <record imps_percentage="85%" total_units="4,378,347" total_loads="109.46" weighted_average="231.04"/> <record imps_percentage="81%" total_units="256,289" total_loads="6.41" weighted_average="224.46"/> <record imps_percentage="75%" total_units="234,200" total_loads="5.86" weighted_average="183.03"/> <record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/> <record imps_percentage="65%" total_units="1,757,111" total_loads="43.93" weighted_average="165.35"/> </report> </record> <record reporting_date_end="03/21/2014" narrative="null"> <report label="Current Volume"> <record central_volume_loads="144.13" zone1_volume="215.06" zone3_volume="27.46" zone2_volume="43.47" zone1_volume_pounds="5,765,336" zone4_volume_pounds="8,602,512" zone2_volume_pounds="1,098,540" zone3_volume_pounds="1,738,636"/> </report> <report label="Zone1"> <record imps_percentage="92-94%" total_units="857,233" total_loads="21.43" weighted_average="268.96"/> <record imps_percentage="90%" total_units="826,451" total_loads="20.66" weighted_average="260.82"/> <record imps_percentage="85%" total_units="3,104,954" total_loads="77.62" weighted_average="223.99"/> <record imps_percentage="81%" total_units="8,145" total_loads=".20" weighted_average="248.76"/> <record imps_percentage="75%" total_units="136,304" total_loads="3.41" weighted_average="183.96"/> <record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/> <record imps_percentage="65%" total_units="832,249" total_loads="20.81" weighted_average="164.72"/> </report> <report label="Zone2"> <record imps_percentage="92-94%" total_units="995,233" total_loads="24.88" weighted_average="270.50"/> <record imps_percentage="90%" total_units="1,696,678" total_loads="42.42" weighted_average="262.66"/> <record imps_percentage="85%" total_units="3,929,416" total_loads="98.24" weighted_average="227.55"/> <record imps_percentage="81%" total_units="269,549" total_loads="6.74" weighted_average="220.32"/> <record imps_percentage="75%" total_units="181,125" total_loads="4.53" weighted_average="186.37"/> <record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/> <record imps_percentage="65%" total_units="1,530,511" total_loads="38.26" weighted_average="165.33"/> </report> </record> <record reporting_date_end="03/14/2014" narrative="null"> <report label="Current Volume"> <record central_volume_loads="126.80" zone1_volume="186.95" zone3_volume="25.49" zone2_volume="34.67" zone1_volume_pounds="5,071,823" zone4_volume_pounds="7,478,136" zone2_volume_pounds="1,019,473" zone3_volume_pounds="1,386,840"/> </report> <report label="Zone1"> <record imps_percentage="92-94%" total_units="738,249" total_loads="18.46" weighted_average="263.39"/> <record imps_percentage="90%" total_units="783,791" total_loads="19.59" weighted_average="254.45"/> <record imps_percentage="85%" total_units="2,586,286" total_loads="64.66" weighted_average="220.72"/> <record imps_percentage="81%" total_units="49,348" total_loads="1.23" weighted_average="191.71"/> <record imps_percentage="75%" total_units="131,844" total_loads="3.30" weighted_average="182.33"/> <record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/> <record imps_percentage="65%" total_units="782,305" total_loads="19.56" weighted_average="150.54"/> </report> </record>
</report>
</results>And here's what Excel turns out:
exportTime label slug reporting_date_end narrative label2 central_volume_loads zone1_volume zone3_volume zone2_volume zone1_volume_pounds zone4_volume_pounds zone2_volume_pounds zone3_volume_pounds imps_percentage total_units total_loads weighted_average
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Current Volume 166.79 233.26 17.52 48.94 6,671,777 9,330,260 700,694 1,957,789
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 92-94% 794,007 19.85 274.76
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 90% 1,091,657 27.29 264.35
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 85% 3,467,452 86.69 228.20
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 81% 12,501 .31 228.24
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 75% 195,690 4.89 187.30
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 65% 1,110,470 27.76 166.04
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 92-94% 891,007 22.28 275.26
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 90% 1,813,306 45.33 265.04
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 85% 4,378,347 109.46 231.04
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 81% 256,289 6.41 224.46
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 75% 234,200 5.86 183.03
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 65% 1,757,111 43.93 165.35
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Current Volume 144.13 215.06 27.46 43.47 5,765,336 8,602,512 1,098,540 1,738,636
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 92-94% 857,233 21.43 268.96
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 90% 826,451 20.66 260.82
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 85% 3,104,954 77.62 223.99
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 81% 8,145 .20 248.76
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 75% 136,304 3.41 183.96
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 65% 832,249 20.81 164.72
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 92-94% 995,233 24.88 270.50
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 90% 1,696,678 42.42 262.66
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 85% 3,929,416 98.24 227.55
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 81% 269,549 6.74 220.32
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 75% 181,125 4.53 186.37
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 65% 1,530,511 38.26 165.33
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Current Volume 126.8 186.95 25.49 34.67 5,071,823 7,478,136 1,019,473 1,386,840
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 92-94% 738,249 18.46 263.39
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 90% 783,791 19.59 254.45
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 85% 2,586,286 64.66 220.72
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 81% 49,348 1.23 191.71
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 75% 131,844 3.30 182.33
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 65% 782,305 19.56 150.54 11 1 Answer
First, your XML is wrong, since it contains elements that are nested inside themselves.
This pertains to the report and record items.
I have posted a fixed version of the file in pastebin,
where I have renamed the outer report and record items
to xreport and xrecord.
Second, since it contains nested elements, they cannot be simply converted to CSV, since one CSV file can only contain one type of element.
Because of the nesting problem, PowerShell cannot be used on such files, as it can only do simple conversions. See the article Cool powershell: From Xml to Csv in two linesfor an example.
PowerShell can be hand-coded to convert XML files of a known structure. For example, see these articles :
A tool that can handle this complexity is the freeXml To Csv Conversion Tool.
To make it work, you will need to download and unpack the following :
- The interactive tool version 16
- Xml To Csv Command Line version 1.5
Once unpacked in one folder, you could use the interactive program to test its functioning on one or two files.
If it does what you need, you can use the console variant to batch-convert files. The format of the batch command is :
XmlToCsv.Console.exe -xml <source.xml> -dir <destination-folder>The batch file needs to convert files that have similar structure each to its own folder, since it will for example convert the above example file into five files named record.csv, report.csv, results.csv, xrecord.csv, xreport.csv. As these names repeat for similarly-structured XML files, the generated files will overwrite each other if they are all directed to the same folder.
If this program does not suite your needs, you will need to write a Powershell script, that should have two phases :
- Phase 1 : Create the CSV header by drilling down to the lowest level
- Phase 2 : Depth-search the entire XML to generate the CSV data-lines.
Your script could loop over the files, for each doing the following :
[xml] $xml = [xml] (Get-Content -Path \path\to\file.xml)
[System.Xml.XmlElement] $root = $xml.get_DocumentElement()$root contains the <results> node that has these properties :
Notice that you are looking for items that have MemberType of Property. Items that have Definition of string are the ones you need to collect, while a System.Object needs to be drilled again :
And again :
This time we need to drill down the item having the Definition of System.Xml.XmlElement :
Now you have all your properties and can construct the header line.
The script itself will need to use the forach command to loop through the attributes of objects and checking the attributes of MemberType and Definition.
I cannot afford the time needed for writing the full script, but hopefully this may get you started.
4