How do I hide images that have a certain class when creating a pdf from html?

Tags: c#,html,css,regex,itextsharp

Problem :

I am having an issue trying to hide image elements that contain a certain class when converting the html to pdf, using iTextSharp (5.x).

I do not have access over the original Html as it comes from another source, however, I can do basic things like Regex and string.replace in C# after I get it.

A simple example of the Html string would be something like this:

        <img src="somepath/desktop.jpg" class="img-desktop">Desktop</img>
        <img src="somepath/mobile.jpg" class="img-mobile">Mobile</img>

This string is then getting created into a PDF using the XMLWorker in iTextSharp.

I need to hide the second image and, more generically, any image element with the "img-mobile" class.

What I've tried:

  • Add img.img-mobile {display:none} to the CSS that is sent in when creating the pdf
  • Add img.img-mobile {width:0;height:0} to the CSS
  • Add @media print { img.img-mobile: display:none} to the CSS
  • Add @media print { img.img-mobile: width:0;height:0} to the CSS
  • Use Regex to find an img element with that classes, then loop through the matches, replace the source with empty source and replace the original html of that string with the new string (my Regex isn't grabbing any matches, unfortunately)

            var pattern = "<img.*?class=\"img-mobile.*\"\\s?>.*</img>";
            var mobileImages = Regex.Matches(innerHtml, pattern);
            var srcPattern = "src=\".*\" ";
            foreach (var imageElement in mobileImages)
                var replaceString = Regex.Replace(imageElement.ToString(), srcPattern, " ");
                innerHtml.Replace(imageElement.ToString(), replaceString);

I am quickly running out of ideas on how to handle this... The only saving grace is that the Html that comes in is consistent since a tool is generating it, somewhere else. So, when a user "adds an image to that html" it will always be structured the same, so Regex and replace methods are acceptable, although a CSS method would be much more preferred...

Solution :

Even if you're a Regex expert and your input is predictable as mentioned, parsing HTML is hard. A better and easier way is to use a tested/proven parser, which is available in pretty much every programming language. For .NET it's HtmlAgilityPack. If you know a bit of XPath, which is quite similar to CSS selectors, it's pretty simple to setup and select the specific nodes you want to remove:

string RemoveImage(string htmlToParse)
    var hDocument = new HtmlDocument()
        OptionWriteEmptyNodes = true,
        OptionAutoCloseOnEnd = true
    var root = hDocument.DocumentNode;
    var imagesDesktop = root.SelectNodes("//img[@class='img-desktop']"); 
    foreach (var image in imagesDesktop)
        var imageText = image.NextSibling;
    return root.WriteTo();

And then pass your parsed HTML to iTextSharp:

var parsedHtml = RemoveImage(HTML);
using (var xmlSnippet = new StringReader(parsedHtml))
    using (FileStream stream = new FileStream(
        using (var document = new Document())
            PdfWriter writer = PdfWriter.GetInstance(
                document, stream
                writer, document, xmlSnippet

works for me with the HTML snippet you provided.

UPDATE, after comment about 'approved' code:

Aah, the dreaded CCB. Know how that goes. :( If HtmlAgilityPack doesn't pass, here's an alternate solution, although it's probably not the best Regex ever written. ;)

const string HTML = @"
    <p class='img-desktop'>Paragraph</p>
        <img src='somepath/desktop.jpg' class='img-desktop'>Desktop</img>
        <img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
        <img src='somepath/desktop.jpg' alt='img-desktop' title='img-desktop' class=""img-desktop"">Desktop
        <img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>

public void Go()
    var regex = new Regex(
        // initial update
        // @"<img[^>]*class='?""?'?img-desktop""?[^>]*>.*?</img>",

        // after seeing accepted answer, noticed a bad copy/paste.
        // above works, but for readability should have been this:
        // and also noticed above can be shortened to this, which works too
        // @"<img[^>]*class=[^>]*img-desktop[^>]*>.*?</img>"
        RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline
    Console.WriteLine(regex.Replace(HTML, ""));

The Regex gives you a little extra leeway in case the actual HTML you're dealing with isn't exactly as posted above.

    CSS Howto..

    css - I've got a nested floating div, how do I get it to fill the entire 100%?

    How to properly preload images, js and css files?

    How is does Jquery display a tooltip on this span element?

    How to apply CSS selector on Select Option Child

    How to keep image position despite window/device size?

    My entire text in aspx page font be in arial. How to change it in css [closed]

    How to make equal space between boxes in one row via CSS

    How could I use pseudo-element :after :before conditionally

    How to enable auto indentation in lists and list items?

    Unable to show progress bar

    how to count how many elements have a certain class

    How to make the height of the div take all the space?

    How to force a button inside a div not to inherit the parent CSS?

    Background color does not show up when scrolling to the right

    How to change my Mercurial web interface aspect

    How to change the style of horizontal menu

    html, css: How to cause a div be presented on top of other divs

    How to Set Two Transition Durations for Multiple CSS Transforms

    How to add TITLE and ALT to an background image (CSS Sprites)?

    How do I make cards of different heights fill gap in css? [duplicate]

    How to place an HTML element over another using CSS

    What shape is this, and how can i achieve it using css?

    How can I make a CSS3 hover transition run only once and not 'rewind' after the user 'un-hovers'?

    How to write css for a div under a div which specific id which is itself under a div with specific class?

    How can I have multiple jQuery button styles on one page?

    how to make sure that that grid layout row adjusts the height automatically as per content

    How to change symbol(character) of iconmoon font by CSS?

    using css, how to create a white circle within a transparent div?

    drupal themes: how do I include several css files / js files on my theme's .info file?

    How to overlap a border in CSS in order to create tabs?