How do I hide images that have a certain class when creating a pdf from html?


Tags: c#,html,css,regex,itextsharp

Problem :

I am having an issue trying to hide image elements that contain a certain class when converting the html to pdf, using iTextSharp (5.x).

I do not have access over the original Html as it comes from another source, however, I can do basic things like Regex and string.replace in C# after I get it.

A simple example of the Html string would be something like this:

<div>
    <div>
        <img src="somepath/desktop.jpg" class="img-desktop">Desktop</img>
        <img src="somepath/mobile.jpg" class="img-mobile">Mobile</img>
    </div>
</div>

This string is then getting created into a PDF using the XMLWorker in iTextSharp.

I need to hide the second image and, more generically, any image element with the "img-mobile" class.

What I've tried:

  • Add img.img-mobile {display:none} to the CSS that is sent in when creating the pdf
  • Add img.img-mobile {width:0;height:0} to the CSS
  • Add @media print { img.img-mobile: display:none} to the CSS
  • Add @media print { img.img-mobile: width:0;height:0} to the CSS
  • Use Regex to find an img element with that classes, then loop through the matches, replace the source with empty source and replace the original html of that string with the new string (my Regex isn't grabbing any matches, unfortunately)

            var pattern = "<img.*?class=\"img-mobile.*\"\\s?>.*</img>";
            var mobileImages = Regex.Matches(innerHtml, pattern);
            var srcPattern = "src=\".*\" ";
            foreach (var imageElement in mobileImages)
            {
    
                var replaceString = Regex.Replace(imageElement.ToString(), srcPattern, " ");
                innerHtml.Replace(imageElement.ToString(), replaceString);
            }
    

I am quickly running out of ideas on how to handle this... The only saving grace is that the Html that comes in is consistent since a tool is generating it, somewhere else. So, when a user "adds an image to that html" it will always be structured the same, so Regex and replace methods are acceptable, although a CSS method would be much more preferred...



Solution :

Even if you're a Regex expert and your input is predictable as mentioned, parsing HTML is hard. A better and easier way is to use a tested/proven parser, which is available in pretty much every programming language. For .NET it's HtmlAgilityPack. If you know a bit of XPath, which is quite similar to CSS selectors, it's pretty simple to setup and select the specific nodes you want to remove:

string RemoveImage(string htmlToParse)
{
    var hDocument = new HtmlDocument()
    {
        OptionWriteEmptyNodes = true,
        OptionAutoCloseOnEnd = true
    };
    hDocument.LoadHtml(htmlToParse);
    var root = hDocument.DocumentNode;
    var imagesDesktop = root.SelectNodes("//img[@class='img-desktop']"); 
    foreach (var image in imagesDesktop)
    {
        var imageText = image.NextSibling;
        imageText.Remove();
        image.Remove();
    }
    return root.WriteTo();
}

And then pass your parsed HTML to iTextSharp:

var parsedHtml = RemoveImage(HTML);
using (var xmlSnippet = new StringReader(parsedHtml))
{
    using (FileStream stream = new FileStream(
        outputFile,
        FileMode.Create,
        FileAccess.Write))
    {
        using (var document = new Document())
        {
            PdfWriter writer = PdfWriter.GetInstance(
                document, stream
            );
            document.Open();
            XMLWorkerHelper.GetInstance().ParseXHtml(
                writer, document, xmlSnippet
            );
        }
    }
}

works for me with the HTML snippet you provided.

UPDATE, after comment about 'approved' code:

Aah, the dreaded CCB. Know how that goes. :( If HtmlAgilityPack doesn't pass, here's an alternate solution, although it's probably not the best Regex ever written. ;)

const string HTML = @"
<div>
    <p class='img-desktop'>Paragraph</p>
    <div>
        <img src='somepath/desktop.jpg' class='img-desktop'>Desktop</img>
        <img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
    </div>
    <div>
        <img src='somepath/desktop.jpg' alt='img-desktop' title='img-desktop' class=""img-desktop"">Desktop
</IMG>
        <img src='somepath/mobile.jpg' class='img-mobile'>Mobile</img>
    </div>
</div>";

public void Go()
{
    var regex = new Regex(
        // initial update
        // @"<img[^>]*class='?""?'?img-desktop""?[^>]*>.*?</img>",

        // after seeing accepted answer, noticed a bad copy/paste.
        // above works, but for readability should have been this:
        @"<img[^>]*class='?""?img-desktop""?'?[^>]*>.*?</img>",
        // and also noticed above can be shortened to this, which works too
        // @"<img[^>]*class=[^>]*img-desktop[^>]*>.*?</img>"
        RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline
    );
    Console.WriteLine(regex.Replace(HTML, ""));
}

The Regex gives you a little extra leeway in case the actual HTML you're dealing with isn't exactly as posted above.


    CSS Howto..

    how to prevent css animation from running on page load

    How to target individually nested elements using CSS

    How can I flip multiple div at same time?

    How can I add custom search box on page?

    How to implement this button in HTML / CSS?

    How to add & show 4d image in web site with css

    Wordpress Menus are not showing up correctly?

    How to create a div with a diagonal (or angled) top line

    How do you change the theme and colors of Site.Master in ASP.NET / C#?

    How can I float dynamic div's next to each other?

    How do I use CSS to to vertically center align an image in a div?

    How to make the hover attributes stays when it is clicked/active?

    How to change css style by using Unicode Hexadecimal

    How to create change image option on mousehover like in gmail account?

    How do I wrap several buttons inside a TD element

    How to move text list items to align with an image list item

    How to create a speech bubble in css?

    how to add a class to buttons attribute using css or jquery

    How to tell Gulp to skip or ignore some files in gulp.src([…])?

    how to increase/decrease font-size from external css [closed]

    How do I increase the size of the text contained?

    How can I target an id=“last_name[]_field” with css

    How can I customise jquery loupe to have a circular lens?

    Hide/Show Content at Specific Scroll Positions

    jQuery + CSS text show not working

    How to change the font family of Highchart to Bootstrap css default font family

    How do I install and use Groundwork CSS

    How to remove whitespace that appears after relative positioning an element with CSS

    How to Use jquery Animate() with .slideToggle() Between Two Classes

    HTML/CSS: how to position forms? [closed]