- 3
- 101 388
fcento
Приєднався 11 чер 2013
Parsing XML with Namespaces with Python (xml.etree.ElementTree)
Example of approach for XML data with Namespaces using Python xml.etree.ElementTree
Reference: docs.python.org/3/library/xml.etree.elementtree.html
Previous video, overview of the Python xlm.etree.ElementTree module: ua-cam.com/video/bWfAD7wAfOI/v-deo.html
Reference: docs.python.org/3/library/xml.etree.elementtree.html
Previous video, overview of the Python xlm.etree.ElementTree module: ua-cam.com/video/bWfAD7wAfOI/v-deo.html
Переглядів: 20 565
Відео
Fixing Missing Element Tags in XML with Python (xml.etree.ElementTree)
Переглядів 4,3 тис.4 роки тому
Example of approach for XML data which is missing tags in elements using Python xml.etree.ElementTree Reference: docs.python.org/3/library/xml.etree.elementtree.html Previous video, overview of the Python xlm.etree.ElementTree module: ua-cam.com/video/bWfAD7wAfOI/v-deo.html Next video, XML with Namespaces: ua-cam.com/video/aB_koPUNqfo/v-deo.html
Parsing XML files with Python (xml.etree.ElementTree)
Переглядів 77 тис.4 роки тому
Overview of the Python xlm.etree.ElementTree module for parsing and editing and creating XML files. Reference: docs.python.org/3/library/xml.etree.elementtree.html Next video of the series covering a special cases such as elements with missing tags: ua-cam.com/video/5BrVPpOifto/v-deo.html
C'e' qualche modo build in di non essere costretto di formattare questo xml?
Ciao a Tutti. Ho provato da me stesso cosi come lo hai fatto tu. Dopo di che ho trovato il tuo video. Grazzie mille per averlo fatto.
This was incredibly helpful. Thank you!
thank you
What the font bro? 😅 Can't read, I think I am too old
😅
Awesome
Thank you, Francesco, for taking the time to review this library's different functions. You have greatly helped me finish a much-needed script for our localization engineering tasks. Notably, adding text to an existing tag saved the day.
Thanks for saving my day Francesco :)
nice vid thanks
After editing the xml file, I don't want the ns tag in each line. What should I do?
Can you show us how to parse a Tableau dashboard file (*.twb)? It's an XML file, Tableau just renamed it. I am trying to create a data dictionary from the .twb file.
Thank you Francesco for explaining this concept so easily with a demo.
Thanks a lot for the great tutorial. Your approach to XML parsing was spot-on for me and it was exactly what I was looking for to get started on XML parsing.
Excellent man! This is what I was looking for :)
Nice job Francesco, thank you. I do have a question regarding your last example. How do you get x.tag without the namespace in it ?
That's how it is supposed to be taught. I have been browsing the courses on how to do it and they all are complicated. Thankfully found this video. Thanks a lot. Great job!
thank you
Tried to find a solutuion on stackoverflow and other pages, and this video was the solution, good examples, good explanation with actual codes run. Thank you!
Francesco, thank you. Thanks to you I now somewhat understand what is happening in the XML file I have, I did not know that is contained name spaces. Within the first 6 minutes, I am writing this before I have read the comments or watched the whole video. You have saved me hours of coding as I was going to write my own parser. All the other video sorta, kinds, neglected this small detail! Now after reading the comments, I see you have helped a LOT of people, thank you again.
Thanks a lot. It saved lot of my time
Have you stopped created videos?
Is there anything in particular you would like to see? Been thinking to possibly do a video on CuPy
Thank yopu so much, I was beginning to think I was cursed to manually write a text parser for these xml files forever.
Really excellent explanation. Thank you so much.
Really cristal clear tutorial, I understand a lot of things I dindn't understand on XML namespaces, Thanks a lot Guy!
Is there a way to change sub-element instead of the whole element string? let's say for example that I want to change W with SW but not the name, and I need to do it in a loop so I can't put the name string inside as it changes anytime, is there a way to call the specific sub element?
can you mass edit multiple files?
this code is specific to specific xml, the code should be generic for any XML, if tags are missing it should add the missing tags
Surely you can use this information now to achieve many things, are you having any issues? Please share someone might be able ot help.
Hi, How to write the content of etree.dump to an xml file?
Thanks a lot for this video. I couldn't grasp the concepts properly even after reading from books. This video made it look like piece of cake.
hii franseco great video thanks i need small suggestion here let's saya <KTOPL> 100</KTOPL> so in this i need output like KTOPL 100 here i need tag and value both how we can get can u please explian
Great video. Thanks
Thanks for this video, I needed to parse xml from a variable instead of a file and found this : xml_data_tree = ET.fromstring(received_packet)
Hi Francesco, i'm getting error while parsing xml file since it is having special words. kindly hep me to avoid this error. Error : xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 277, column 366
If you are sure the file you have is a valid xml (there are online tools to help you there), then what comes to mind is incorrect encoding. Check the documentation here: docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLParser
Hi i'm trying to get the text of every tag named <Text></Text>, but inside every tag has this: <pp IX='0'/><cp IX='0'/>, some idea to extract/ the content of the tags?: <?xml version='1.0' encoding='utf-8'?> <VisioDocument xmlns='urn:schemas-microsoft-com:office:visio'> <Colors> <ColorEntry IX='0' RGB='#000000'/> <ColorEntry IX='1' RGB='#FFFFFF'/> </Colors> <Fonts> <FontEntry ID='0' Unicode='0' Weight='0' Attributes='23040' CharSet='0' PitchAndFamily='18' Name='monospace'/> </Fonts> <StyleSheets> <StyleSheet ID='0' NameU='No Style'> <StyleProp> <EnableFillProps>1</EnableFillProps> <EnableLineProps>1</EnableLineProps> <EnableTextProps>1</EnableTextProps> <HideForApply>0</HideForApply> </StyleProp> <Line> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.010000</LineWeight> </Line> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#000000</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <TextBlock> <BottomMargin>0.000000</BottomMargin> <DefaultTabStop>0.590551</DefaultTabStop> <LeftMargin>0.000000</LeftMargin> <RightMargin>0.000000</RightMargin> <TextBkgnd>0</TextBkgnd> <TextBkgndTrans>0.000000</TextBkgndTrans> <TextDirection>0</TextDirection> <TopMargin>0.000000</TopMargin> <VerticalAlign>1</VerticalAlign> </TextBlock> <Char IX='0'> <Color>#000000</Color> <Font>0</Font> <FontScale>1.000000</FontScale> <Size>0.166667</Size> </Char> <Para IX='0'> <BulletFontSize>-1</BulletFontSize> <BulletStr>&#xe000;</BulletStr> <HorzAlign>0</HorzAlign> <SpLine>-1.200000</SpLine> </Para> <Tabs IX='0'/> </StyleSheet> </StyleSheets> <Pages> <Page ID='0'> <PageSheet ID='0'> <PageProps> <PageWidth>1.651575</PageWidth> <PageHeight>0.748031</PageHeight> </PageProps> </PageSheet> <Shapes> <Shape ID='1' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Polygon.1'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.708661</Height> <PinX>3.720472</PinX> <PinY>6.023622</PinY> <Width>1.612205</Width> </XForm> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#FFFFFF</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <Line> <LineCap>1</LineCap> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.039370</LineWeight> </Line> <Geom IX='0'> <NoFill>0</NoFill> <NoLine>0</NoLine> <NoShow>0</NoShow> <NoSnap>0</NoSnap> <MoveTo IX='1'> <X>0.000000</X> <Y>0.000000</Y> </MoveTo> <LineTo IX='2'> <X>1.612205</X> <Y>0.000000</Y> </LineTo> <LineTo IX='3'> <X>1.612205</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='4'> <X>0.000000</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='5'> <X>0.000000</X> <Y>0.000000</Y> </LineTo> </Geom> </Shape> <Shape ID='2' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Text.2'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.247563</Height> <PinX>3.889961</PinX> <PinY>5.511811</PinY> <Width>1.273228</Width> </XForm> <Char IX='0'> <Color>#000000</Color> <Font>0</Font> <FontScale>1.000000</FontScale> <Size>0.247563</Size> </Char> <Para IX='0'> <HorzAlign>1</HorzAlign> </Para> <Text><pp IX='0'/><cp IX='0'/>Entity</Text> </Shape> <Shape ID='1' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Polygon.1'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.708661</Height> <PinX>3.720472</PinX> <PinY>6.023622</PinY> <Width>1.612205</Width> </XForm> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#FFFFFF</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <Line> <LineCap>1</LineCap> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.039370</LineWeight> </Line> <Geom IX='0'> <NoFill>0</NoFill> <NoLine>0</NoLine> <NoShow>0</NoShow> <NoSnap>0</NoSnap> <MoveTo IX='1'> <X>0.000000</X> <Y>0.000000</Y> </MoveTo> <LineTo IX='2'> <X>1.612205</X> <Y>0.000000</Y> </LineTo> <LineTo IX='3'> <X>1.612205</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='4'> <X>0.000000</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='5'> <X>0.000000</X> <Y>0.000000</Y> </LineTo> </Geom> </Shape> <Shape ID='2' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Text.2'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.247563</Height> <PinX>3.889961</PinX> <PinY>5.511811</PinY> <Width>1.273228</Width> </XForm> <Char IX='0'> <Color>#000000</Color> <Font>0</Font> <FontScale>1.000000</FontScale> <Size>0.247563</Size> </Char> <Para IX='0'> <HorzAlign>1</HorzAlign> </Para> <Text><pp IX='0'/><cp IX='0'/>EntityTwo</Text> </Shape> <Shape ID='1' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Polygon.1'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.708661</Height> <PinX>3.720472</PinX> <PinY>6.023622</PinY> <Width>1.612205</Width> </XForm> <Fill> <FillBkgnd>#000000</FillBkgnd> <FillForegnd>#FFFFFF</FillForegnd> <FillPattern>1</FillPattern> <ShdwForegnd>#000000</ShdwForegnd> </Fill> <Line> <LineCap>1</LineCap> <LineColor>#000000</LineColor> <LinePattern>1</LinePattern> <LineWeight>0.039370</LineWeight> </Line> <Geom IX='0'> <NoFill>0</NoFill> <NoLine>0</NoLine> <NoShow>0</NoShow> <NoSnap>0</NoSnap> <MoveTo IX='1'> <X>0.000000</X> <Y>0.000000</Y> </MoveTo> <LineTo IX='2'> <X>1.612205</X> <Y>0.000000</Y> </LineTo> <LineTo IX='3'> <X>1.612205</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='4'> <X>0.000000</X> <Y>-0.708661</Y> </LineTo> <LineTo IX='5'> <X>0.000000</X> <Y>0.000000</Y> </LineTo> </Geom> </Shape> <Shape ID='2' Type='Shape' FillStyle='0' LineStyle='0' TextStyle='0' NameU='Text.2'> <Data1>0</Data1> <Data2>0</Data2> <Data3>0</Data3> <XForm> <Height>0.247563</Height> <PinX>3.889961</PinX> <PinY>5.511811</PinY> <Width>1.273228</Width> </XForm> <Char IX='0'> <Color>#000000</Color> <Font>0</Font> <FontScale>1.000000</FontScale> <Size>0.247563</Size> </Char> <Para IX='0'> <HorzAlign>1</HorzAlign> </Para> <Text><pp IX='0'/><cp IX='0'/>EntityThree</Text> </Shape> </Shapes> </Page> </Pages> </VisioDocument>
Let's take it in steps. I'm assuming you want to extract 'Entity', 'EntityTwo', 'EntityThree' from the element <Text> (...let me know if i misunderstood your question). The way it's formatted it contains 2 elements (<pp> and <cp>) as well as the piece of text you want to extract. If you just use findall() and use 'text' you get None back, what you want to use in this case is 'tail' instead. I've included a sample code here: gist.github.com/fcento100/74b8691af014a8126f8e9ca2ff03c6ea
i've put the xml code from your comment in a file here gist.github.com/fcento100/19cb7ae6b857c539a2c2843519239efc for convenience
@@fcento Yes, you understood me good. Ohhhh with tail .Well, i checked it but with other xml didn't compile :( , instead of that i put findall('.//cp', ns) and print elm.tail, with that we got the text. I like more your solution but with other xml didn't compile :(((((.This is the error that i got: elmtail = elm.tail.strip() AttributeError: 'NoneType' object has no attribute 'strip'
Apologies for not catching the 'NoneType' error, effectively 'tail' returns None if it doesn't find anything rather than an empty string. It's fixed now in this version: gist.github.com/fcento100/11847ad0d8d42eec6c1dc42de897b842 with an if statement to catch it. The reason i wasn't getting this error was because i copied pasted from your message and since it was formatted, 'tail' returned ' ' and '\t' (which are the string representation of new-line and tab) where it should have returned None, hence why i was able to run the strip command everywhere without error. In the new code i posted I've shown 2 methods of getting at that piece of data; in your sample xml "Entity" etc.. is the tail of <cp>; root.findall('.//visio:Text/',ns) and root.findall('.//visio:cp',ns) do similar things. The only difference is that using './/visio:Text/' in method 1 will also extract the tail for <pp> if is available, which may be undesirable! In that case './/visio:cp' like you suggested is the way to go.
@@fcento a lot of thanks for your kind help Francesco :))
Hi. I have this xmlns and i tried every solution and i can't read the XML with this namespace: <VisioDocument xmlns='urn:schemas-microsoft-com:office:visio'> ..... </VisioDocument> Some idea to solve this? :/ it is a xml, but the extension file is .vdx
Not enough information. What error do you get?
@@fcento hi thank you very much for your contributions, don't know why a while ago didn't compile but now it works, finally i do it with the solution that you recommended us on second 14:20
Thank you! Spended almost two days trying to resolve namespace issue, and than found this video, thank you.
thank you for this
Hi Francesco, In my xml closing tag is missing for example, Current xml - <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> Expected xml - <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> </data> Can you please help me with this?
If the closing tag is missing, it is technically an invalid file. You may have to manually close it yourself as there is no way for the code to know where it should be closed.
Vishal, you can use regex to find missing closed tag and then just replace it with desired closing tag
thanks a lot
Happy to help
I know it's a bit late, but you can correct the XML layout after adding a new subelement by using the Indent method of ElementTree. This refreshes the automatic indent for the whole tree (or element). You may need to specify "space = ' '" (4 spaces) to override the default setting, to match the default layout for the dump method. ET.indent (tree, space = ' ') # for entire tree ET.indent (elm, space = ' ', level = 1) # for just the element although I did have problems with inconsistent spacing trying to just indent the element!! :O
Thanks, it's very clear, most of the tutorials cover only the non namespaces xml
Super helpful, thanks a lot!
Brilliant !!!
Hey man! You just helped me to finish my parser! I do not know how, but I finally was able to find and change a text in each specific element in my XML file. I still do not understand, why I have to register the namespace, but it did the work perfectly. Your vide was really helpful. I didn't understood 70% of it for the 1st time, but now it's more clear to me. Thanks man!
Glad I could help!
Merge XML files using python,can you please make video on this top
That was SO helpful Dude. Thank you so much.
Hi Francesco Cento. I would like to know how to implement below two use cases 1) Incorrect type of data inside an element, for e.g string inside an element that is supposed to have an integer. 2) Missing element: An element that must be present according to XSD is not present in the XML. Could you please suggest any idea on this ? thanks
Rupesh, for your use case you may have to refer to a different library called “LXML” which has xml schema (XSD) support. I’m not experienced with this but looking at the documentation it has has an example on how to construct a validator which should address both your issues.
@@fcento Thank you for your suggestion. I will look into that more.
This was a huge help. Thank you!
Superb, exactly what I needed to know. Thank you
Increase your font size before doing tutorials. its quite complicated to read texts. anyway goodjob