How do I hide images that have a certain class when creating a pdf from html?


Tags: c#,html,css,regex,itextsharp

Problem :

I am having an issue trying to hide image elements that contain a certain class when converting the html to pdf, using iTextSharp (5.x).

I do not have access over the original Html as it comes from another source, however, I can do basic things like Regex and string.replace in C# after I get it.

A simple example of the Html string would be something like this:

<div>
    <div>
        <img src="somepath/desktop.jpg" class="img-desktop">Desktop</img>
        <img src="somepath/mobile.jpg" class="img-mobile">Mobile</img>
    </div>
</div>

This string is then getting created into a PDF using the XMLWorker in iTextSharp.

I need to hide the second image and, more generically, any image element with the "img-mobile" class.

What I've tried:

  • Add img.img-mobile {display:none} to the CSS that is sent in when creating the pdf
  • Add img.img-mobile {width:0;height:0} to the CSS
  • Add @media print { img.img-mobile: display:none} to the CSS
  • Add @media print { img.img-mobile: width:0;height:0} to the CSS
  • Use Regex to find an img element with that classes, then loop through the matches, replace the source with empty source and replace the original html of that string with the new string (my Regex isn't grabbing any matches, unfortunately)

            var pattern = "<img.*?class=\"img-mobile.*\"\\s?>.*</img>";
            var mobileImages = Regex.Matches(innerHtml, pattern);
            var srcPattern = "src=\".*\" ";
            foreach (var imageElement in mobileImages)
            {
    
                var replaceString = Regex.Replace(imageElement.ToString(), srcPattern, " ");
                innerHtml.Replace(imageElement.ToString(), replaceString);
            }
    

I am quickly running out of ideas on how to handle this... The only saving grace is that the Html that comes in is consistent since a tool is generating it, somewhere else. So, when a user "adds an image to that html" it will always be structured the same, so Regex and replace methods are acceptable, although a CSS method would be much more preferred...



Solution :

Even if you're a Regex expert and your input is predictable as mentioned, parsing HTML is hard. A better and easier way is to use a tested/proven parser, which is available in pretty much every programming language. For .NET it's HtmlAgilityPack. If you know a bit of XPath, which is quite similar to CSS selectors, it's pretty simple to setup and select the specific nodes you want to remove:

string RemoveImage(string htmlToParse)
{
    var hDocument = new HtmlDocument()
    {
        OptionWriteEmptyNodes = true,
        OptionAutoCloseOnEnd = true
    };
    hDocument.LoadHtml(htmlToParse);
    var root = hDocument.DocumentNode;
    var imagesDesktop = root.SelectNodes("//img[@class='img-desktop']"); 
    foreach (var image in imagesDesktop)
    {
        var imageText = image.NextSibling;
        imageText.Remove();
        image.Remove();
    }
    return root.WriteTo();
}

And then pass your parsed HTML to iTextSharp:

var parsedHtml = RemoveImage(HTML);
using (var xmlSnippet = new StringReader(parsedHtml))
{
    using (FileStream stream = new FileStream(
        outputFile,
        FileMode.Create,
        FileAccess.Write))
    {
        using (var document = new Document())
        {
            PdfWriter writer = PdfWriter.GetInstance(
                document, stream
            );
            document.Open();
            XMLWorkerHelper.GetInstance().ParseXHtml(
                writer, document, xmlSnippet
            );
        }
    }
}

works for me with the HTML snippet you provided.

UPDATE, after comment about 'approved' code:

Aah, the dreaded CCB. Know how that goes. :( If HtmlAgilityPack doesn't pass, here's an alternate solution, although it's probably not the best Regex ever written. ;)

const string HTML = @"
<div>
    <p class='img-desktop'>Paragraph</p>
    <div>
        <img src='somepath/desktop.jpg' class='img-desktop'>Desktop</img>
        <img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
    </div>
    <div>
        <img src='somepath/desktop.jpg' alt='img-desktop' title='img-desktop' class=""img-desktop"">Desktop
</IMG>
        <img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
    </div>
</div>";

public void Go()
{
    var regex = new Regex(
        // initial update
        // @"<img[^>]*class='?""?'?img-desktop""?[^>]*>.*?</img>",

        // after seeing accepted answer, noticed a bad copy/paste.
        // above works, but for readability should have been this:
        @"<img[^>]*class='?""?img-desktop""?'?[^>]*>.*?</img>",
        // and also noticed above can be shortened to this, which works too
        // @"<img[^>]*class=[^>]*img-desktop[^>]*>.*?</img>"
        RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline
    );
    Console.WriteLine(regex.Replace(HTML, ""));
}

The Regex gives you a little extra leeway in case the actual HTML you're dealing with isn't exactly as posted above.


    CSS Howto..

    How to center text over an image while keeping it below fixed nav bar in css?

    jquery .show() hides immediately

    How do I make elements flow horizontally instead of vertically?

    How to display separate paragraphs in the same area using CSS HTML

    CSS: how to center a list with image

    How do I add HTML files to a GitHub repo?

    How to change color of SVG image using CSS (jQuery SVG image replacement)?

    CSS/Menu: How to position element right below/beside elements

    How to add Indian rupee symbol to the RadioButtonList control in asp.net using CSS design

    How can I solve this styling issues?

    IE8 Not showing divs with floats

    How to style a particular paragraph element in css without effecting the other paragraphs in the same class?

    How would I get an img element to render under a background-image in CSS

    How to set Bullet colors in UL/LI html lists via CSS without using any images or span tags [duplicate]

    CSS - How to display dropdown text in one row?

    How to make an image behave like in Primefaces?

    How to animate 2 parallel lines in order to form an X on click using CSS and JQuery

    How to increase or decrease space between two lines in same paragraph?

    How to reverse the text with js/css? [duplicate]

    Slideshow using background images with navigation and captions

    How to reposition CSS-hover popups to stay within a fixed frame?

    How to get Mouse Out effect in CSS

    How do you create a webpage with a floating image, top left, bottom right, and with wrapping text in the middle

    How to pause css animation at orginal state

    How do I apply desktop media queries style to IE?

    How do you associate a css/sass stylesheet to a view in rails?

    How to make a different background for each link from a navbar? [closed]

    How to avoid CSS conflict using jquery?

    How to get multiple rotating background cover with css / Full screen slideshow

    how to position a search bar in html/css