3
101 388

Fixing Missing Element Tags in XML with Python (xml.etree.ElementTree)

Parsing XML files with Python (xml.etree.ElementTree)

Parsing XML with Namespaces with Python (xml.etree.ElementTree)

Example of approach for XML data with Namespaces using Python xml.etree.ElementTree
Reference: docs.python.org/3/library/xml.etree.elementtree.html
Previous video, overview of the Python xlm.etree.ElementTree module: ua-cam.com/video/bWfAD7wAfOI/v-deo.html

Відео

Fixing Missing Element Tags in XML with Python (xml.etree.ElementTree)

15:05

Fixing Missing Element Tags in XML with Python (xml.etree.ElementTree)

Переглядів 4,3 тис.4 роки тому

Example of approach for XML data which is missing tags in elements using Python xml.etree.ElementTree Reference: docs.python.org/3/library/xml.etree.elementtree.html Previous video, overview of the Python xlm.etree.ElementTree module: ua-cam.com/video/bWfAD7wAfOI/v-deo.html Next video, XML with Namespaces: ua-cam.com/video/aB_koPUNqfo/v-deo.html

Parsing XML files with Python (xml.etree.ElementTree)

35:04

Parsing XML files with Python (xml.etree.ElementTree)

Переглядів 77 тис.4 роки тому

Overview of the Python xlm.etree.ElementTree module for parsing and editing and creating XML files. Reference: docs.python.org/3/library/xml.etree.elementtree.html Next video of the series covering a special cases such as elements with missing tags: ua-cam.com/video/5BrVPpOifto/v-deo.html

КОМЕНТАРІ

@konradp6379 3 місяці тому
C'e' qualche modo build in di non essere costretto di formattare questo xml?
@konradp6379 3 місяці тому
Ciao a Tutti. Ho provato da me stesso cosi come lo hai fatto tu. Dopo di che ho trovato il tuo video. Grazzie mille per averlo fatto.
@piercenorton1544 4 місяці тому
This was incredibly helpful. Thank you!
@noneyahbiz6976 4 місяці тому
thank you
@nk461 6 місяців тому
What the font bro? 😅 Can't read, I think I am too old
@fcento 6 місяців тому
😅
@debasishsahoo1268 7 місяців тому
Awesome
@jdvelasquezr 7 місяців тому
Thank you, Francesco, for taking the time to review this library's different functions. You have greatly helped me finish a much-needed script for our localization engineering tasks. Notably, adding text to an existing tag saved the day.
@ShivModiShankar 11 місяців тому
Thanks for saving my day Francesco :)
@attilioturco 11 місяців тому
nice vid thanks
@AnEngineeringGirl Рік тому
After editing the xml file, I don't want the ns tag in each line. What should I do?
@xst-k6 Рік тому
Can you show us how to parse a Tableau dashboard file (*.twb)? It's an XML file, Tableau just renamed it. I am trying to create a data dictionary from the .twb file.
@saranya548 Рік тому
Thank you Francesco for explaining this concept so easily with a demo.
@ImtiazEbnaMannan Рік тому
Thanks a lot for the great tutorial. Your approach to XML parsing was spot-on for me and it was exactly what I was looking for to get started on XML parsing.
@RodrigoMontes Рік тому
Excellent man! This is what I was looking for :)
@equipagescatamaranlangrune8331 Рік тому
Nice job Francesco, thank you. I do have a question regarding your last example. How do you get x.tag without the namespace in it ?
@bayrakmusti1 Рік тому
That's how it is supposed to be taught. I have been browsing the courses on how to do it and they all are complicated. Thankfully found this video. Thanks a lot. Great job!
@nealrutgerskid Рік тому
thank you
@5328csabi 2 роки тому
Tried to find a solutuion on stackoverflow and other pages, and this video was the solution, good examples, good explanation with actual codes run. Thank you!
@stevemorse5052 2 роки тому
Francesco, thank you. Thanks to you I now somewhat understand what is happening in the XML file I have, I did not know that is contained name spaces. Within the first 6 minutes, I am writing this before I have read the comments or watched the whole video. You have saved me hours of coding as I was going to write my own parser. All the other video sorta, kinds, neglected this small detail! Now after reading the comments, I see you have helped a LOT of people, thank you again.
@narayanamurthyuppala6049 2 роки тому
Thanks a lot. It saved lot of my time
@aryan6536 2 роки тому
Have you stopped created videos?
@fcento 2 роки тому
Is there anything in particular you would like to see? Been thinking to possibly do a video on CuPy
@jezhayes 2 роки тому
Thank yopu so much, I was beginning to think I was cursed to manually write a text parser for these xml files forever.
@davidjnevin 2 роки тому
Really excellent explanation. Thank you so much.
@maloman1989 2 роки тому
Really cristal clear tutorial, I understand a lot of things I dindn't understand on XML namespaces, Thanks a lot Guy!
@giacomocillari4448 2 роки тому
Is there a way to change sub-element instead of the whole element string? let's say for example that I want to change W with SW but not the name, and I need to do it in a loop so I can't put the name string inside as it changes anytime, is there a way to call the specific sub element?
@markdillon9588 2 роки тому
can you mass edit multiple files?
@vvtwins4kidz 2 роки тому
this code is specific to specific xml, the code should be generic for any XML, if tags are missing it should add the missing tags
@aryan6536 2 роки тому
Surely you can use this information now to achieve many things, are you having any issues? Please share someone might be able ot help.
@skillbuilder138 2 роки тому
Hi, How to write the content of etree.dump to an xml file?
@UsmanSaadat 2 роки тому
Thanks a lot for this video. I couldn't grasp the concepts properly even after reading from books. This video made it look like piece of cake.
@vijayalakshmi8282 2 роки тому
hii franseco great video thanks i need small suggestion here let's saya <KTOPL> 100</KTOPL> so in this i need output like KTOPL 100 here i need tag and value both how we can get can u please explian
@A_A7337 2 роки тому
Great video. Thanks
@sidjjj 2 роки тому
Thanks for this video, I needed to parse xml from a variable instead of a file and found this : xml_data_tree = ET.fromstring(received_packet)
@CinemagicMindset 2 роки тому
Hi Francesco, i'm getting error while parsing xml file since it is having special words. kindly hep me to avoid this error. Error : xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 277, column 366
@fcento 2 роки тому
If you are sure the file you have is a valid xml (there are online tools to help you there), then what comes to mind is incorrect encoding. Check the documentation here: docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLParser
@Gamer-mg6my 2 роки тому
Hi i'm trying to get the text of every tag named <Text></Text>, but inside every tag has this: <pp IX='0'/><cp IX='0'/>, some idea to extract/ the content of the tags?: <?xml version='1.0' encoding='utf-8'?> <VisioDocument xmlns='urn:schemas-microsoft-com:office:visio'> <Colors> <ColorEntry IX='0' RGB='#000000'/> <ColorEntry IX='1' RGB='#FFFFFF'/> </Colors> <Fonts> <FontEntry ID='0' Unicode='0' Weight='0' Attributes='23040' CharSet='0' PitchAndFamily='18' Name='monospace'/> </Fonts> <StyleSheets> <StyleSheet ID='0' NameU='No Style'> <StyleProp> <EnableFillProps>1</EnableFillProps> <EnableLineProps>1</EnableLineProps> <EnableTextProps>1</EnableTextProps> <HideForApply>0</HideForApply> </StyleProp> <Line> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.010000</LineWeight> </Line> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#000000</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <TextBlock> <BottomMargin>0.000000</BottomMargin> <DefaultTabStop>0.590551</DefaultTabStop> <LeftMargin>0.000000</LeftMargin> <RightMargin>0.000000</RightMargin> <TextBkgnd>0</TextBkgnd> <TextBkgndTrans>0.000000</TextBkgndTrans> <TextDirection>0</TextDirection> <TopMargin>0.000000</TopMargin> <VerticalAlign>1</VerticalAlign> </TextBlock> <Char IX='0'> <Color>#000000</Color> 0 <FontScale>1.000000</FontScale> <Size>0.166667</Size> </Char> <Para IX='0'> <BulletFontSize>-1</BulletFontSize> <BulletStr>&#xe000;</BulletStr> <HorzAlign>0</HorzAlign> <SpLine>-1.200000</SpLine> </Para> <Tabs IX='0'/> </StyleSheet> </StyleSheets> <Pages> <Page ID='0'> <PageSheet ID='0'> <PageProps> <PageWidth>1.651575</PageWidth> <PageHeight>0.748031</PageHeight> </PageProps> </PageSheet> <Shapes> <Shape ID='1' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Polygon.1'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.708661</Height> <PinX>3.720472</PinX> <PinY>6.023622</PinY> <Width>1.612205</Width> </XForm> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#FFFFFF</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <Line> <LineCap>1</LineCap> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.039370</LineWeight> </Line> <Geom IX='0'> <NoFill>0</NoFill> <NoLine>0</NoLine> <NoShow>0</NoShow> <NoSnap>0</NoSnap> <MoveTo IX='1'> <X>0.000000</X> <Y>0.000000</Y> </MoveTo> <LineTo IX='2'> <X>1.612205</X> <Y>0.000000</Y> </LineTo> <LineTo IX='3'> <X>1.612205</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='4'> <X>0.000000</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='5'> <X>0.000000</X> <Y>0.000000</Y> </LineTo> </Geom> </Shape> <Shape ID='2' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Text.2'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.247563</Height> <PinX>3.889961</PinX> <PinY>5.511811</PinY> <Width>1.273228</Width> </XForm> <Char IX='0'> <Color>#000000</Color> 0 <FontScale>1.000000</FontScale> <Size>0.247563</Size> </Char> <Para IX='0'> <HorzAlign>1</HorzAlign> </Para> <Text><pp IX='0'/><cp IX='0'/>Entity</Text> </Shape> <Shape ID='1' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Polygon.1'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.708661</Height> <PinX>3.720472</PinX> <PinY>6.023622</PinY> <Width>1.612205</Width> </XForm> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#FFFFFF</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <Line> <LineCap>1</LineCap> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.039370</LineWeight> </Line> <Geom IX='0'> <NoFill>0</NoFill> <NoLine>0</NoLine> <NoShow>0</NoShow> <NoSnap>0</NoSnap> <MoveTo IX='1'> <X>0.000000</X> <Y>0.000000</Y> </MoveTo> <LineTo IX='2'> <X>1.612205</X> <Y>0.000000</Y> </LineTo> <LineTo IX='3'> <X>1.612205</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='4'> <X>0.000000</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='5'> <X>0.000000</X> <Y>0.000000</Y> </LineTo> </Geom> </Shape> <Shape ID='2' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Text.2'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.247563</Height> <PinX>3.889961</PinX> <PinY>5.511811</PinY> <Width>1.273228</Width> </XForm> <Char IX='0'> <Color>#000000</Color> 0 <FontScale>1.000000</FontScale> <Size>0.247563</Size> </Char> <Para IX='0'> <HorzAlign>1</HorzAlign> </Para> <Text><pp IX='0'/><cp IX='0'/>EntityTwo</Text> </Shape> <Shape ID='1' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Polygon.1'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.708661</Height> <PinX>3.720472</PinX> <PinY>6.023622</PinY> <Width>1.612205</Width> </XForm> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#FFFFFF</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <Line> <LineCap>1</LineCap> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.039370</LineWeight> </Line> <Geom IX='0'> <NoFill>0</NoFill> <NoLine>0</NoLine> <NoShow>0</NoShow> <NoSnap>0</NoSnap> <MoveTo IX='1'> <X>0.000000</X> <Y>0.000000</Y> </MoveTo> <LineTo IX='2'> <X>1.612205</X> <Y>0.000000</Y> </LineTo> <LineTo IX='3'> <X>1.612205</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='4'> <X>0.000000</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='5'> <X>0.000000</X> <Y>0.000000</Y> </LineTo> </Geom> </Shape> <Shape ID='2' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Text.2'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.247563</Height> <PinX>3.889961</PinX> <PinY>5.511811</PinY> <Width>1.273228</Width> </XForm> <Char IX='0'> <Color>#000000</Color> 0 <FontScale>1.000000</FontScale> <Size>0.247563</Size> </Char> <Para IX='0'> <HorzAlign>1</HorzAlign> </Para> <Text><pp IX='0'/><cp IX='0'/>EntityThree</Text> </Shape> </Shapes> </Page> </Pages> </VisioDocument>
@fcento 2 роки тому
Let's take it in steps. I'm assuming you want to extract 'Entity', 'EntityTwo', 'EntityThree' from the element <Text> (...let me know if i misunderstood your question). The way it's formatted it contains 2 elements (<pp> and <cp>) as well as the piece of text you want to extract. If you just use findall() and use 'text' you get None back, what you want to use in this case is 'tail' instead. I've included a sample code here: gist.github.com/fcento100/74b8691af014a8126f8e9ca2ff03c6ea
@fcento 2 роки тому
i've put the xml code from your comment in a file here gist.github.com/fcento100/19cb7ae6b857c539a2c2843519239efc for convenience
@Gamer-mg6my 2 роки тому
@@fcento Yes, you understood me good. Ohhhh with tail .Well, i checked it but with other xml didn't compile :( , instead of that i put findall('.//cp', ns) and print elm.tail, with that we got the text. I like more your solution but with other xml didn't compile :(((((.This is the error that i got: elmtail = elm.tail.strip() AttributeError: 'NoneType' object has no attribute 'strip'
@fcento 2 роки тому
Apologies for not catching the 'NoneType' error, effectively 'tail' returns None if it doesn't find anything rather than an empty string. It's fixed now in this version: gist.github.com/fcento100/11847ad0d8d42eec6c1dc42de897b842 with an if statement to catch it. The reason i wasn't getting this error was because i copied pasted from your message and since it was formatted, 'tail' returned ' ' and '\t' (which are the string representation of new-line and tab) where it should have returned None, hence why i was able to run the strip command everywhere without error. In the new code i posted I've shown 2 methods of getting at that piece of data; in your sample xml "Entity" etc.. is the tail of <cp>; root.findall('.//visio:Text/',ns) and root.findall('.//visio:cp',ns) do similar things. The only difference is that using './/visio:Text/' in method 1 will also extract the tail for <pp> if is available, which may be undesirable! In that case './/visio:cp' like you suggested is the way to go.
@Gamer-mg6my 2 роки тому
@@fcento a lot of thanks for your kind help Francesco :))
@Gamer-mg6my 2 роки тому
Hi. I have this xmlns and i tried every solution and i can't read the XML with this namespace: <VisioDocument xmlns='urn:schemas-microsoft-com:office:visio'> ..... </VisioDocument> Some idea to solve this? :/ it is a xml, but the extension file is .vdx
@fcento 2 роки тому
Not enough information. What error do you get?
@Gamer-mg6my 2 роки тому
@@fcento hi thank you very much for your contributions, don't know why a while ago didn't compile but now it works, finally i do it with the solution that you recommended us on second 14:20
@IMMORTALmen 2 роки тому
Thank you! Spended almost two days trying to resolve namespace issue, and than found this video, thank you.
@stanleymbah8983 2 роки тому
thank you for this
@vishalkalal22 2 роки тому
Hi Francesco, In my xml closing tag is missing for example, Current xml - <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> Expected xml - <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> </data> Can you please help me with this?
@fcento 2 роки тому
If the closing tag is missing, it is technically an invalid file. You may have to manually close it yourself as there is no way for the code to know where it should be closed.
@AnilKumar23456 2 роки тому
Vishal, you can use regex to find missing closed tag and then just replace it with desired closing tag
@myyoutubeaccount0123_ 2 роки тому
thanks a lot
@fcento 2 роки тому
Happy to help
@kn7298 2 роки тому
I know it's a bit late, but you can correct the XML layout after adding a new subelement by using the Indent method of ElementTree. This refreshes the automatic indent for the whole tree (or element). You may need to specify "space = ' '" (4 spaces) to override the default setting, to match the default layout for the dump method. ET.indent (tree, space = ' ') # for entire tree ET.indent (elm, space = ' ', level = 1) # for just the element although I did have problems with inconsistent spacing trying to just indent the element!! :O
@AlvaroMelchor 3 роки тому
Thanks, it's very clear, most of the tutorials cover only the non namespaces xml
@tessdejaeghere6972 3 роки тому
Super helpful, thanks a lot!
@maxnoish 3 роки тому
Brilliant !!!
@LukasNachtigall 3 роки тому
Hey man! You just helped me to finish my parser! I do not know how, but I finally was able to find and change a text in each specific element in my XML file. I still do not understand, why I have to register the namespace, but it did the work perfectly. Your vide was really helpful. I didn't understood 70% of it for the 1st time, but now it's more clear to me. Thanks man!
@fcento 3 роки тому
Glad I could help!
@shrinivasulunandyala9269 3 роки тому
Merge XML files using python,can you please make video on this top
@andrewbourne2296 3 роки тому
That was SO helpful Dude. Thank you so much.
@rupeshbhuju2897 3 роки тому
Hi Francesco Cento. I would like to know how to implement below two use cases 1) Incorrect type of data inside an element, for e.g string inside an element that is supposed to have an integer. 2) Missing element: An element that must be present according to XSD is not present in the XML. Could you please suggest any idea on this ? thanks
@fcento 3 роки тому
Rupesh, for your use case you may have to refer to a different library called “LXML” which has xml schema (XSD) support. I’m not experienced with this but looking at the documentation it has has an example on how to construct a validator which should address both your issues.
@rupeshbhuju2897 3 роки тому
@@fcento Thank you for your suggestion. I will look into that more.
@GuitFishN 3 роки тому
This was a huge help. Thank you!
@thomasloia8874 3 роки тому
Superb, exactly what I needed to know. Thank you
@arshap9351 3 роки тому
Increase your font size before doing tutorials. its quite complicated to read texts. anyway goodjob

fcento

КОМЕНТАРІ